linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AMD Bulldozer FX-8150 Powers off during kernel build
@ 2012-09-13  1:30 Sid Boyce
  2012-09-13  9:44 ` Borislav Petkov
  0 siblings, 1 reply; 4+ messages in thread
From: Sid Boyce @ 2012-09-13  1:30 UTC (permalink / raw)
  To: LKML Mailing List

I have a huge heatsink and large CPU fan plus lots of cooling fans in 
the case and nothing gets hot.
If I build e.g 3.6-rc5 with 8 or 6 cores, part way through it suddenly 
powers off.

I have checked hwmon/k10temp.c to see if I could see where these values 
were defined.

k10temp.h is 0 bytes.
-rw-r--r-- 1 root root 0 Sep  9 01:59 
/usr/src/linux-3.6.0-rc5/include/config/sensors/k10temp.h

Currently I build with "make -j 1" and temperature and power values are 
around those below.
# sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +60.4°C  (high = +70.0°C)
                        (crit = +90.0°C, hyst = +87.0°C)

fam15h_power-pci-00c4
Adapter: PCI adapter
power1:      127.49 W  (crit = 124.77 W)

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 1
model name      : AMD FX(tm)-8150 Eight-Core Processor
stepping        : 2
microcode       : 0x6000626
cpu MHz         : 3600.000
cache size      : 2048 KB

from .config:-
# grep HWMON .config
CONFIG_IXGBE_HWMON=y
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL_HWMON=y

# grep POWERSAVE .config
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
# CONFIG_PCIEASPM_POWERSAVE is not set
CONFIG_DEVFREQ_GOV_POWERSAVE=y

On another 6-core box I can build kernels with "make -j 6" without problems.
# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 1
model name      : AMD FX(tm)-6100 Six-Core Processor
stepping        : 2
microcode       : 0x6000623
cpu MHz         : 3300.000
cache size      : 2048 KB

With a kernel build going on six core box, temperature and power hover 
around the values below.
sabre:~ # sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +50.2°C  (high = +70.0°C)
                        (crit = +90.0°C, hyst = +87.0°C)

fam15h_power-pci-00c4
Adapter: PCI adapter
power1:       94.40 W  (crit =  95.01 W)

73 ... Sid.

-- 
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Senior Staff Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: AMD Bulldozer FX-8150 Powers off during kernel build
  2012-09-13  1:30 AMD Bulldozer FX-8150 Powers off during kernel build Sid Boyce
@ 2012-09-13  9:44 ` Borislav Petkov
  2012-09-13 21:58   ` Sid Boyce
  0 siblings, 1 reply; 4+ messages in thread
From: Borislav Petkov @ 2012-09-13  9:44 UTC (permalink / raw)
  To: Sid Boyce; +Cc: LKML Mailing List, Andreas Herrmann

On Thu, Sep 13, 2012 at 02:30:27AM +0100, Sid Boyce wrote:
> I have a huge heatsink and large CPU fan plus lots of cooling fans
> in the case and nothing gets hot.
> If I build e.g 3.6-rc5 with 8 or 6 cores, part way through it
> suddenly powers off.

Ok, can you catch the whole dmesg when you boot the machine _after_ the
sudden poweroff? You can send it to me and Andreas (on CC) privately if
you prefer.

Important: make sure the kernel has CONFIG_X86_MCE and
CONFIG_EDAC_DECODE_MCE built-in.

Please make sure to use a recent kernel, i.e. 3.4, 3.5 is fine.

Thanks.

(Leaving in the rest for reference)

> I have checked hwmon/k10temp.c to see if I could see where these
> values were defined.
> 
> k10temp.h is 0 bytes.
> -rw-r--r-- 1 root root 0 Sep  9 01:59
> /usr/src/linux-3.6.0-rc5/include/config/sensors/k10temp.h
> 
> Currently I build with "make -j 1" and temperature and power values
> are around those below.
> # sensors
> k10temp-pci-00c3
> Adapter: PCI adapter
> temp1:        +60.4°C  (high = +70.0°C)
>                        (crit = +90.0°C, hyst = +87.0°C)
> 
> fam15h_power-pci-00c4
> Adapter: PCI adapter
> power1:      127.49 W  (crit = 124.77 W)
> 
> # cat /proc/cpuinfo
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 21
> model           : 1
> model name      : AMD FX(tm)-8150 Eight-Core Processor
> stepping        : 2
> microcode       : 0x6000626
> cpu MHz         : 3600.000
> cache size      : 2048 KB
> 
> from .config:-
> # grep HWMON .config
> CONFIG_IXGBE_HWMON=y
> CONFIG_HWMON=y
> CONFIG_HWMON_VID=m
> # CONFIG_HWMON_DEBUG_CHIP is not set
> CONFIG_THERMAL_HWMON=y
> 
> # grep POWERSAVE .config
> # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
> CONFIG_CPU_FREQ_GOV_POWERSAVE=m
> # CONFIG_PCIEASPM_POWERSAVE is not set
> CONFIG_DEVFREQ_GOV_POWERSAVE=y
> 
> On another 6-core box I can build kernels with "make -j 6" without problems.
> # cat /proc/cpuinfo
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 21
> model           : 1
> model name      : AMD FX(tm)-6100 Six-Core Processor
> stepping        : 2
> microcode       : 0x6000623
> cpu MHz         : 3300.000
> cache size      : 2048 KB
> 
> With a kernel build going on six core box, temperature and power
> hover around the values below.
> sabre:~ # sensors
> k10temp-pci-00c3
> Adapter: PCI adapter
> temp1:        +50.2°C  (high = +70.0°C)
>                        (crit = +90.0°C, hyst = +87.0°C)
> 
> fam15h_power-pci-00c4
> Adapter: PCI adapter
> power1:       94.40 W  (crit =  95.01 W)
> 
> 73 ... Sid.
> 
> -- 
> Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
> Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
> Senior Staff Specialist, Cricket Coach
> Microsoft Windows Free Zone - Linux used for all Computing Tasks
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: AMD Bulldozer FX-8150 Powers off during kernel build
  2012-09-13  9:44 ` Borislav Petkov
@ 2012-09-13 21:58   ` Sid Boyce
  2012-09-13 22:28     ` Borislav Petkov
  0 siblings, 1 reply; 4+ messages in thread
From: Sid Boyce @ 2012-09-13 21:58 UTC (permalink / raw)
  To: Borislav Petkov, LKML Mailing List, Andreas Herrmann

# uname -r
3.6.0-rc5-u1-smp+

I built a new 3.6-rc5 kernel (3.6.0-rc5-u2) using 3.6.0-rc5-u1 with 8 
cores and power off didn't ocur.
slipstream:/usr/src/linux-3.6.0-rc5-u1 # grep POWER .config
# CONFIG_ACPI_PROCFS_POWER is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_X86_POWERNOW_K8=m
# CONFIG_PCIEASPM_POWERSAVE is not set
CONFIG_INPUT_POWERMATE=m
CONFIG_IPMI_POWEROFF=m
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_PDA_POWER=m
CONFIG_TEST_POWER=m
CONFIG_POWER_AVS=y
CONFIG_SENSORS_FAM15H_POWER=m
CONFIG_SENSORS_ACPI_POWER=m
CONFIG_SND_AC97_POWER_SAVE=y
CONFIG_SND_AC97_POWER_SAVE_DEFAULT=0
# CONFIG_SND_HDA_POWER_SAVE is not set
# CONFIG_HID_LCPOWER is not set
CONFIG_DEVFREQ_GOV_POWERSAVE=y
CONFIG_EVENT_POWER_TRACING_DEPRECATED=y
# CONFIG_XZ_DEC_POWERPC is not set

When it was powering off "CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y" was 
set.
slipstream:/usr/src/linux-3.6.0-rc5-u1 # grep PERFORMANCE .config
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_PCIEASPM_PERFORMANCE=y
CONFIG_DEVFREQ_GOV_PERFORMANCE=y

slipstream:/usr/src/linux-3.6.0-rc5-u1 # grep MCE .config
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_X86_MCE_INJECT is not set
CONFIG_EDAC_DECODE_MCE=y
# CONFIG_EDAC_MCE_INJ is not set

During the build temperature and power was around these values
-------------------------------------------------------------------------------------
fam15h_power-pci-00c4
Adapter: PCI adapter
power1:      133.30 W  (crit = 124.77 W)

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +61.9°C  (high = +70.0°C)
                        (crit = +90.0°C, hyst = +87.0°C)

Immediately after the build the values are much lower than what it was 
with the kernel and config that caused the power off.
----------------------------------------
fam15h_power-pci-00c4
Adapter: PCI adapter
power1:       31.10 W  (crit = 124.77 W)

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +33.2°C  (high = +70.0°C)
                        (crit = +90.0°C, hyst = +87.0°C)
------------------------------------------

If needed I can go back to the earlier 3.6.0-rc5 kernel and config to 
recreate the power off situation.
With the kernel that powered off, MCE was not set and 
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y

For the 3.6.0-rc5-u1 kernel only those 2 were changed.
Regards
Sid.

On 13/09/12 10:44, Borislav Petkov wrote:
> On Thu, Sep 13, 2012 at 02:30:27AM +0100, Sid Boyce wrote:
>> I have a huge heatsink and large CPU fan plus lots of cooling fans
>> in the case and nothing gets hot.
>> If I build e.g 3.6-rc5 with 8 or 6 cores, part way through it
>> suddenly powers off.
> Ok, can you catch the whole dmesg when you boot the machine _after_ the
> sudden poweroff? You can send it to me and Andreas (on CC) privately if
> you prefer.
>
> Important: make sure the kernel has CONFIG_X86_MCE and
> CONFIG_EDAC_DECODE_MCE built-in.
>
> Please make sure to use a recent kernel, i.e. 3.4, 3.5 is fine.
>
> Thanks.
>
> (Leaving in the rest for reference)
>
>> I have checked hwmon/k10temp.c to see if I could see where these
>> values were defined.
>>
>> k10temp.h is 0 bytes.
>> -rw-r--r-- 1 root root 0 Sep  9 01:59
>> /usr/src/linux-3.6.0-rc5/include/config/sensors/k10temp.h
>>
>> Currently I build with "make -j 1" and temperature and power values
>> are around those below.
>> # sensors
>> k10temp-pci-00c3
>> Adapter: PCI adapter
>> temp1:        +60.4°C  (high = +70.0°C)
>>                         (crit = +90.0°C, hyst = +87.0°C)
>>
>> fam15h_power-pci-00c4
>> Adapter: PCI adapter
>> power1:      127.49 W  (crit = 124.77 W)
>>
>> # cat /proc/cpuinfo
>> processor       : 0
>> vendor_id       : AuthenticAMD
>> cpu family      : 21
>> model           : 1
>> model name      : AMD FX(tm)-8150 Eight-Core Processor
>> stepping        : 2
>> microcode       : 0x6000626
>> cpu MHz         : 3600.000
>> cache size      : 2048 KB
>>
>> from .config:-
>> # grep HWMON .config
>> CONFIG_IXGBE_HWMON=y
>> CONFIG_HWMON=y
>> CONFIG_HWMON_VID=m
>> # CONFIG_HWMON_DEBUG_CHIP is not set
>> CONFIG_THERMAL_HWMON=y
>>
>> # grep POWERSAVE .config
>> # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
>> CONFIG_CPU_FREQ_GOV_POWERSAVE=m
>> # CONFIG_PCIEASPM_POWERSAVE is not set
>> CONFIG_DEVFREQ_GOV_POWERSAVE=y
>>
>> On another 6-core box I can build kernels with "make -j 6" without problems.
>> # cat /proc/cpuinfo
>> processor       : 0
>> vendor_id       : AuthenticAMD
>> cpu family      : 21
>> model           : 1
>> model name      : AMD FX(tm)-6100 Six-Core Processor
>> stepping        : 2
>> microcode       : 0x6000623
>> cpu MHz         : 3300.000
>> cache size      : 2048 KB
>>
>> With a kernel build going on six core box, temperature and power
>> hover around the values below.
>> sabre:~ # sensors
>> k10temp-pci-00c3
>> Adapter: PCI adapter
>> temp1:        +50.2°C  (high = +70.0°C)
>>                         (crit = +90.0°C, hyst = +87.0°C)
>>
>> fam15h_power-pci-00c4
>> Adapter: PCI adapter
>> power1:       94.40 W  (crit =  95.01 W)
>>
>> 73 ... Sid.
>>
>> -- 
>>


-- 
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Senior Staff Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: AMD Bulldozer FX-8150 Powers off during kernel build
  2012-09-13 21:58   ` Sid Boyce
@ 2012-09-13 22:28     ` Borislav Petkov
  0 siblings, 0 replies; 4+ messages in thread
From: Borislav Petkov @ 2012-09-13 22:28 UTC (permalink / raw)
  To: Sid Boyce; +Cc: LKML Mailing List, Andreas Herrmann

On Thu, Sep 13, 2012 at 10:58:49PM +0100, Sid Boyce wrote:
> If needed I can go back to the earlier 3.6.0-rc5 kernel and config to
> recreate the power off situation. With the kernel that powered off,
> MCE was not set and CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y

Yes, as I suggested earlier, enable CONFIG_X86_MCE and
CONFIG_EDAC_DECODE_MCE and *then* try recreating the reboot.

After it reboots, catch the whole dmesg and send it to me.

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-09-13 22:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-13  1:30 AMD Bulldozer FX-8150 Powers off during kernel build Sid Boyce
2012-09-13  9:44 ` Borislav Petkov
2012-09-13 21:58   ` Sid Boyce
2012-09-13 22:28     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).