All of lore.kernel.org
 help / color / mirror / Atom feed
* Freezing system after kernel 3.2
@ 2016-02-08 18:08 Karsten Malcher
  2016-02-09  3:51 ` Ken Moffat
  0 siblings, 1 reply; 5+ messages in thread
From: Karsten Malcher @ 2016-02-08 18:08 UTC (permalink / raw)
  To: linux-kernel

Hello,

i am sorry, but is it possible that a kernel bug for a special chipset is alive since kernel 3.2?

I have a backup PC with an Asrock ALiveXFire eSATA2 R3.0 mainboard with CPU AMD64 X2 6000+.
Before this mainboard runs very stable with Debian wheezy and kernel 3.2.0.
Now i tried to update to Jessie with kernel 3.16 and the board is crashing within 1-5 minutes after boot!

There is no clear error, mostly the system is just suddenly freezing without any message or log.
Sometimes i get a kernel panic like "fatal exception in interrupt", but the boot parameter "pc=nomsi" has no effect.

I can rule out hardware problems, because the memtest can run for many hours finding nothing.
Additionally i can boot Knoppix 6.7 with kernel 3.0.4 and it is running stable.
But when i boot Knoppix 7.2 with kernel 3.9.6 the system is freezing!
Aditionally i tried out kernel 4.3.0 in Debian but it does not help. Any newer kernel freezes.

I am sure that newer kernel have a problem with this special mainboard hardware.

lspci shows:

00:00.0 Host bridge: ATI Technologies Inc RS480 Host Bridge (rev 01)
00:02.0 PCI bridge: ATI Technologies Inc RS480 PCI-X Root Port
00:05.0 PCI bridge: ATI Technologies Inc RS480 PCI Bridge
00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0)
00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1)
00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2)
00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3)
00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4)
00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI)
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 13)
00:14.1 IDE interface: ATI Technologies Inc SB600 IDE
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
02:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3600 Series
02:00.1 Audio device: ATI Technologies Inc RV635 Audio device [Radeon HD 3600 Series]

My hope is that there are known problems with parts of this mainboard hardware?

Best regards
Karsten

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Freezing system after kernel 3.2
  2016-02-08 18:08 Freezing system after kernel 3.2 Karsten Malcher
@ 2016-02-09  3:51 ` Ken Moffat
  2016-02-09 10:42   ` Karsten Malcher
  2016-02-09 16:03   ` Karsten Malcher
  0 siblings, 2 replies; 5+ messages in thread
From: Ken Moffat @ 2016-02-09  3:51 UTC (permalink / raw)
  To: Karsten Malcher; +Cc: linux-kernel

On Mon, Feb 08, 2016 at 07:08:09PM +0100, Karsten Malcher wrote:
> Hello,
> 
> i am sorry, but is it possible that a kernel bug for a special chipset is alive since kernel 3.2?
> 
On uncommon hardware, anything is possible.  I don't actually know
if that hardware is "uncommon", only that I do not have it.

> I have a backup PC with an Asrock ALiveXFire eSATA2 R3.0 mainboard with CPU AMD64 X2 6000+.
> Before this mainboard runs very stable with Debian wheezy and kernel 3.2.0.
> Now i tried to update to Jessie with kernel 3.16 and the board is crashing within 1-5 minutes after boot!
> 
> There is no clear error, mostly the system is just suddenly freezing without any message or log.
> Sometimes i get a kernel panic like "fatal exception in interrupt", but the boot parameter "pc=nomsi" has no effect.
> 
> I can rule out hardware problems, because the memtest can run for many hours finding nothing.

LOL.  I have a phenom x4 : from time to time (fairly frequently) it
loses its lunch during compiles if I use make -j4.  On less-frequent
occasions it does the same even with make -j1.  And always
memtest86+-5.01 is happy [ well, if I use the "run all CPUs [F2]
option it locks up, but it does that on at least two other mobos
too: one of those is an intel SandyBridge so that issue is not
AMD-specific ].

> Additionally i can boot Knoppix 6.7 with kernel 3.0.4 and it is running stable.
> But when i boot Knoppix 7.2 with kernel 3.9.6 the system is freezing!
> Aditionally i tried out kernel 4.3.0 in Debian but it does not help. Any newer kernel freezes.
> 
> I am sure that newer kernel have a problem with this special mainboard hardware.
> 

If nobody else has better suggestions, I think you will have to
build upstream kernels to find when it broke.  I suggest that you
begin with standard 3.2.latest (just in case you turned out to rely
on something in the debian kernel but not upstream).  Then try
3.9.latest : if that runs ok, continue with 3.16.latest.  If not,
try e.g. 3.4.latest.  The aim is to first find which minor release
broke, and then which update in that series broke it.  What you
*might* need to do is also try .0 versions of each of these.

I am suggesting that you bisect this.  Bisection is usually a pain,
so I suggest that you first find a working version, and then work
through the next stable release of that version to find which commit
broke it.  I suspect I might have phrased my suggestions badly, but
even 3.9 is so long ago that most of us have forgotten about it [ my
last box running a 3.10 LTS kernel was a ppc64, and I have not
booted that for about 2 years ].

Good Luck, and I hope you get a better suggestion.

ĸen
-- 
This email was written using 100% recycled letters.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Freezing system after kernel 3.2
  2016-02-09  3:51 ` Ken Moffat
@ 2016-02-09 10:42   ` Karsten Malcher
  2016-02-09 16:03   ` Karsten Malcher
  1 sibling, 0 replies; 5+ messages in thread
From: Karsten Malcher @ 2016-02-09 10:42 UTC (permalink / raw)
  To: Ken Moffat; +Cc: linux-kernel

Hello Ken,

thank you for the answer!

> On uncommon hardware, anything is possible.  I don't actually know
> if that hardware is "uncommon", only that I do not have it.

Before i start to debug the kernel versions i tried to find other problem reportings for this mainboard.
And i found them! http://napalmpiri.info/tag/freeze/
Yes - this is uncommon hardware!

It seems that this mainboard type is a slip from normal Asrock quality.
Specially the BIOS seems to be buggy and have never been fixed complete. :-(

> LOL.  I have a phenom x4 : from time to time (fairly frequently) it
> loses its lunch during compiles if I use make -j4.  On less-frequent
> occasions it does the same even with make -j1.  And always
> memtest86+-5.01 is happy [ well, if I use the "run all CPUs [F2]
> option it locks up, but it does that on at least two other mobos
> too: one of those is an intel SandyBridge so that issue is not
> AMD-specific ].

The solution seems to be: never change a running system when you have one. :-)
But after a couple of years you must change it, specially when parts are broken.


> If nobody else has better suggestions, I think you will have to
> build upstream kernels to find when it broke.  I suggest that you
> begin with standard 3.2.latest (just in case you turned out to rely
> on something in the debian kernel but not upstream).  Then try
> 3.9.latest : if that runs ok, continue with 3.16.latest.  If not,
> try e.g. 3.4.latest.  The aim is to first find which minor release
> broke, and then which update in that series broke it.  What you
> *might* need to do is also try .0 versions of each of these.


Maybe i will try this.
But currently i think it is not a bug of the used chipset.
This mainboard has a bad BIOS and the last update is from 2011.
There is no hope that the problems will be fixed. :-(


> Good Luck, and I hope you get a better suggestion.

I could reactivate an old backup with Debian wheezy and kernel 3.2.
This i running stable on this mainboard and i think it will be the latest release that is usable there.

Karsten

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Freezing system after kernel 3.2
  2016-02-09  3:51 ` Ken Moffat
  2016-02-09 10:42   ` Karsten Malcher
@ 2016-02-09 16:03   ` Karsten Malcher
       [not found]     ` <20160209194217.GA16581@milliways>
  1 sibling, 1 reply; 5+ messages in thread
From: Karsten Malcher @ 2016-02-09 16:03 UTC (permalink / raw)
  To: Ken Moffat; +Cc: linux-kernel

I have to found out that freezing can occur under kernel 3.2 too, but far less common.
So the interesting question is why in newer kernels this will occur very often?

I could found a solution for the problem in the linked Blog.
When you disable Cool' n' Quiet the system is running stable with newer kernel too!

So it seems to be a buggy BIOS that is causing 2 big problems:
1. RAM can only be used with 3264 MB of 4096 MB.
2. When Cool' n' Quiet is enabled the system is freezing within an minute.

I have reported this to Asrock, but i don't think they will do anything for an older mainboard ...

You can found several problems regarding freezing with the RS480 chip,
but not together with the frequency scaling (Cool'n'Quiet).
This problems occur with other chipsets too.

How the frequency scaling is working?
I have found some hints about disabling C1 and C6 states will help to solve the problem:

https://forum.teksyndicate.com/t/amd-system-keeps-freezing/78380/17
https://rog.asus.com/forum/showthread.php?60224-amd-system-keeps-freezing&s=03842b8ef2cb1a25bc458b6fe56c9213&p=492388&viewfull=1#post492388
https://community.amd.com/message/2645600#2645600

But i have no settings about this states in my BIOS.
Has the behaviour of the frequency scaling drivers changed somehow?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Freezing system after kernel 3.2
       [not found]     ` <20160209194217.GA16581@milliways>
@ 2016-02-10 15:12       ` Karsten Malcher
  0 siblings, 0 replies; 5+ messages in thread
From: Karsten Malcher @ 2016-02-10 15:12 UTC (permalink / raw)
  To: Ken Moffat; +Cc: linux-kernel

Hello Ken,

Am 09.02.2016 um 20:42 schrieb Ken Moffat:
> On Tue, Feb 09, 2016 at 05:03:57PM +0100, Karsten Malcher wrote:
>
>> I have to found out that freezing can occur under kernel 3.2 too, but far less common.
>> So the interesting question is why in newer kernels this will occur very often?
>>
> An interesting question, but almost impossible to track down (I've
> had a problem myself where the "is it ok?" question could not be
> reliably answered - al you can do is find that some kernel versions
> (or some stable revisions) seem worse / less bad.

In my case it seems clear that something is wrong with the frequency scaling.
I can see with cpufreq-info:
driver: acpi-cpufreq
profil: "ondemand"
actual: 800 MHz

You find some good explanations here:
https://wiki.archlinux.org/index.php/CPU_frequency_scaling#CPU_frequency_driver

acpi-cpufreq: "CPUFreq driver which utilizes the ACPI Processor Performance States. "
So the cpu is not touched direct through the driver - only over the ACPI.
ACPI is a BIOS function - so it is clear that bugs in the BIOS will cause the problem there!


>> I could found a solution for the problem in the linked Blog.
>> When you disable Cool' n' Quiet the system is running stable with newer kernel too!
>>
> Ouch.  Full heat, higher power bills.

Yes - that's not nice!

>
>> So it seems to be a buggy BIOS that is causing 2 big problems:
>> 1. RAM can only be used with 3264 MB of 4096 MB.
>> 2. When Cool' n' Quiet is enabled the system is freezing within an minute.
>>
>> I have reported this to Asrock, but i don't think they will do anything for an older mainboard ...
>>
>> You can found several problems regarding freezing with the RS480 chip,
>> but not together with the frequency scaling (Cool'n'Quiet).
>> This problems occur with other chipsets too.
>>
>> How the frequency scaling is working?
>> I have found some hints about disabling C1 and C6 states will help to solve the problem:
>>
>> https://forum.teksyndicate.com/t/amd-system-keeps-freezing/78380/17
>> https://rog.asus.com/forum/showthread.php?60224-amd-system-keeps-freezing&s=03842b8ef2cb1a25bc458b6fe56c9213&p=492388&viewfull=1#post492388
>> https://community.amd.com/message/2645600#2645600
>>
>> But i have no settings about this states in my BIOS.
> Google found a post (in chinese, I think) which mentioned
>
> echo 1 > /sys/devices/system/cpu/cpuX/cpuidle/stateY/disable
>
> where X and Y should be replaced by the cpu number and the state
> number.  I would be very reluctant to try that, particularly for C1
> because that is the standard halt state and power usage might go up
> significantly.

I will have a closer look at this.
But when all functionality is going though the BIOS i am not hopefully.


>
>> Has the behaviour of the frequency scaling drivers changed somehow?
> The behaviour of the cpufreq drivers has definitely changed over the
> years.  When I bought my phenom x4 I also bought an i3 SandyBridge
> intel.  At that time (2012) both were running the ondemand governor
> and the phenom was faster in all my compiles (I build
> linuxfromscratch and work on beyond linuxfromscratch, as well as
> trying to keep an eye on test kernels in case there are problems for
> my hardware).

For interest i have a look at the sources of the acpi-driver.
But it is not understandable without background knowledge of the details. :-)

As i know there is an individual table stored in the cpu that keeps the possible settings
for voltage, multiplier and states.
This should be read and managed through the BIOS and give standard ACPI functionality to the OS.
Really complex.

> At some point, the intel governor moved to 'performance' - I changed
> to the altered kernel driver when it was first available (on intels
> 'performance' works well - generally low power except when busy).

But you can switch between the different scaling governors.
https://wiki.archlinux.org/index.php/CPU_frequency_scaling#Scaling_governors

Normally this should fit to the needs.

> A bit later, there were changes affecting AMD - I forget the details,
> I think early K8s were not altered, only K10 and later.  Since then,
> my phenom has been slower than the i3 when doing single-threaded
> compiles.  Some of that might be because jobs deliberately bounce
> around the cores to even out wear and heat, possibly on my AMD
> machines (I've also got a recent A10) there is a cache penalty when
> moving between the processor 'units'.

I read a german article about possible overclocking of Intel CPU's some days ago.
http://www.heise.de/newsticker/meldung/Skylake-Overclocking-Katz-und-Mausspiel-zwischen-Intel-und-Mainboard-Herstellern-3097870.html
Maybe you can translate with google.
There is a possibility with some BIOS and CPU's to make overclocking that is only available with much more expensive
type of CPU's.

> At the risk of pissing you off with tales of new hardware, I suspect
> there are still changes in the kernel cpufreq area - I bought a
> haswell i7 last week, nominally 3.6 GHz but the specs say it can
> boost a thread to 3.9GHz.  I started with SystemRescueCD and saw the
> frequencies at 3.6 GHz maximum, and often at lower frequencies when
> idle.  Then I installed Mint, I think the frequencies were similar.
> After that I put Fedora23 on it and was shocked to see that all
> cores were often around 3.9GHz - what was odd was that the system
> power draw (PC, monitor, net switch, kvm switch) remained at about
> 90 Watts when idle, up to about 166W when running make -j 8.

I could not find any information what has changed over the time,
but here some interesting Doku:

http://kernel.org/doc/Documentation/cpu-freq/governors.txt
http://kernel.org/doc/Documentation/cpu-freq/cpu-drivers.txt
http://www.pantz.org/software/cpufreq/usingcpufreqonlinux.html
http://unix.stackexchange.com/questions/121410/setting-cpu-governor-to-on-demand-or-conservative

>
> Now that I have installed linuxfromscratch (a binary copy from the
> SandyBridge, then I used that to compile a current version with a
> 4.4.1 kernel) I never see frequencies less than 3.7 GHz reported,
> but the power consumption is nice and low when idle.

For me i was always satisfied with the functionality of the linux frequency drivers.
But it is not fine when i have to miss it.

>
> Summary: over time, everything changes.

And get's more complicated.
Then the time will come that nobody can solve problems when they will occur.

>
> Good luck with taming that machine to a mostly-usable state.  The
> only thing I ever had to do to get a box stable was on an early K8,
> and with two pairs of memory sticks I had to back off the memory
> frequency from the default.  But that was years ago, and the
> then-current version of memtest86 showed the problem in an
> overnight run.  Note that my phenom continues as "from time to time,
> unstable" (some odd crashes, many internal compiler errors).

It's better to have a stable PC that runs at full speed then a PC that is freezing and not usable.
But it is not amazing to search and fix such problems.

Karsten

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-02-10 15:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-08 18:08 Freezing system after kernel 3.2 Karsten Malcher
2016-02-09  3:51 ` Ken Moffat
2016-02-09 10:42   ` Karsten Malcher
2016-02-09 16:03   ` Karsten Malcher
     [not found]     ` <20160209194217.GA16581@milliways>
2016-02-10 15:12       ` Karsten Malcher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.