* Problems with system lockup
@ 2010-11-18 17:26 kd1zd
2010-11-18 18:42 ` Dave Platt
0 siblings, 1 reply; 5+ messages in thread
From: kd1zd @ 2010-11-18 17:26 UTC (permalink / raw)
To: linux-hams
I've been having problems recently with my Linux system. It randomly locks up, and I was wondering if anyone out there has experienced problems with a hung system.
My system is CentOS 5.5, and is running UROnode and JNOS. I have been having segmentation violation issues with JNOS, which I'm trying to figure out. I have two serial ports which drive TNCs connected to radios.
When I say the system is hung, I mean that the only thing that will liven it is a hard reboot or power cycle. Anyone else troubleshoot these issues before? I'm trying to run a 24/7 TCP/IP node and BBS, but these lockups make it very hard to do so.
Thanks,
--
Robert Thoelen, KD1ZD
Check out http://www.rtcubed.org/kd1zd,
packet radio in CT
Station phone: (860) 698-0502
Station email kd1zd <at> rtcubed.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problems with system lockup
2010-11-18 17:26 Problems with system lockup kd1zd
@ 2010-11-18 18:42 ` Dave Platt
2010-11-18 19:13 ` Gordon JC Pearce
0 siblings, 1 reply; 5+ messages in thread
From: Dave Platt @ 2010-11-18 18:42 UTC (permalink / raw)
To: kd1zd; +Cc: linux-hams
kd1zd@rtcubed.org wrote:
> I've been having problems recently with my Linux system. It randomly locks up, and I was wondering if anyone out there has experienced problems with a hung system.
>
> My system is CentOS 5.5, and is running UROnode and JNOS. I have been having segmentation violation issues with JNOS, which I'm trying to figure out. I have two serial ports which drive TNCs connected to radios.
>
> When I say the system is hung, I mean that the only thing that will liven it is a hard reboot or power cycle. Anyone else troubleshoot these issues before? I'm trying to run a 24/7 TCP/IP node and BBS, but these lockups make it very hard to do so.
There can be a number of problems which can cause these
sorts of lockups... sometimes hardware, sometimes software,
sometimes an interaction of the two. I've run into a bunch
of them over the years.
Some examples:
- Hardware problems on the motherboard, pure and simple...
bad DRAM, for example, or an overheating CPU due to a
fan failure, or overly-aggressive overclocking. It
wouldn't hurt to install, and then run the stand-alone
MEMTEST86+ check (let it run overnight, at least) to
see if there are DRAM or timing problems.
The fact that you're seeing segfaults in JNOS, as well
as complete freezes, brings this possibility to the
top of my UsualSuspects list.
- Power-supply problems... momentary voltage sags can
glitch a box pretty badly.
- Problems with the power-management code, in the kernel
or in the BIOS (e.g. Intel SpeedStep, or the AMD
equivalent). There have been a fair number of motherboards
and CPUs which don't handle the switching between different
processor clock speeds and voltages properly.
- PCI (or other) cards not plugged securely into their slots,
resulting in intermittent contacts that can cause all sorts
of confusion.
- Driver bugs. I had a nasty periodic full freeze on my new
firewall/server system at home, which turned out to be a bug
in the driver for the USB Ethernet dongle I was using to add
a third Ethernet port... it worked OK under light load but
froze the system solid under some heavy-load conditions. If
you've got a uSB dongle which uses the "kaweth" driver, get
rid of it.
Something you may be able to do, as a very short term ugly
workaround, is to use a hardware-based watchdog to reboot
the system if it freezes. A lot of motherboards these days
use a "Super IO" chip which incorporates such a watchdog, and
Linux has a driver and utility program to access it. Start up
the watchdog program (in the "no exit allowed" mode), and if
the watchdog program doesn't wake up and successfully poke the
watchdog chip's registers every ten seconds or so, the chip will
yank the board's /RESET line and do a hard reboot. Nasty, but
perhaps better than a day-long hang until you can get home
and push the Big Red Button yourself. As the man said in
Young Frankenstein, "A riot is an ugly thing... and I think
it's about time we had one!!!"
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problems with system lockup
2010-11-18 18:42 ` Dave Platt
@ 2010-11-18 19:13 ` Gordon JC Pearce
0 siblings, 0 replies; 5+ messages in thread
From: Gordon JC Pearce @ 2010-11-18 19:13 UTC (permalink / raw)
To: linux-hams
On Thu, 2010-11-18 at 10:42 -0800, Dave Platt wrote:
> - Hardware problems on the motherboard, pure and simple...
> bad DRAM, for example, or an overheating CPU due to a
> fan failure, or overly-aggressive overclocking. It
> wouldn't hurt to install, and then run the stand-alone
> MEMTEST86+ check (let it run overnight, at least) to
> see if there are DRAM or timing problems.
This ^^^. It smells hardware-y. I'd be really surprised if JNOS could
do anything to hard-lock the system - typically even if something is
poking really low-level drivers the worst it will do is cause a kernel
panic. Okay, that *will* basically lock up your system, but it should
be more informative than "just plain catatonic".
Very nearly all "just locks up" problems I've run across have been down
to dying memory or overheating CPUs. In the past I've found that the
latter has been quite good for causing segfaults when the CPU is driven
hard, like compiling code or rendering video.
Gordon MM0YEQ
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problems with system lockup
2010-11-19 13:32 kd1zd
@ 2010-11-19 14:50 ` Gordon JC Pearce
0 siblings, 0 replies; 5+ messages in thread
From: Gordon JC Pearce @ 2010-11-19 14:50 UTC (permalink / raw)
To: linux-hams
On Fri, 2010-11-19 at 13:32 +0000, kd1zd@rtcubed.org wrote:
> Thanks for all the responses. After reading them, I realized I've seen more crashes in the last week, when I added a usb hard drive enclosure to the system. I removed it to see igf the crashes are less frequent.
>
> Also, I do run one of thr TNCs off of a USB to serial dongle, maybe that also has something to do with it.
Shouldn't make a difference, although some USB interfaces don't drive
the RS-232 side hard enough for some older TNCs to work.
> I will at some point give the memory test a try. That may also help pinpoint if there are other things that are problems.
It's definitely worth doing sooner rather than later.
> Can any of you recommend a watchdog card?
Not really, no. If your machine keeps locking up, it's because it's
faulty. Treat the fault, not the symptom.
Gordon MM0YEQ
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problems with system lockup
@ 2010-11-19 13:32 kd1zd
2010-11-19 14:50 ` Gordon JC Pearce
0 siblings, 1 reply; 5+ messages in thread
From: kd1zd @ 2010-11-19 13:32 UTC (permalink / raw)
To: linux-hams
Thanks for all the responses. After reading them, I realized I've seen more crashes in the last week, when I added a usb hard drive enclosure to the system. I removed it to see igf the crashes are less frequent.
Also, I do run one of thr TNCs off of a USB to serial dongle, maybe that also has something to do with it.
I will at some point give the memory test a try. That may also help pinpoint if there are other things that are problems.
Can any of you recommend a watchdog card?
------Original Message------
From: Gordon JC Pearce
Sender: linux-hams-owner@vger.kernel.org
To: linux-hams@vger.kernel.org
Subject: Re: Problems with system lockup
Sent: Nov 18, 2010 14:13
On Thu, 2010-11-18 at 10:42 -0800, Dave Platt wrote:
> - Hardware problems on the motherboard, pure and simple...
> bad DRAM, for example, or an overheating CPU due to a
> fan failure, or overly-aggressive overclocking. It
> wouldn't hurt to install, and then run the stand-alone
> MEMTEST86+ check (let it run overnight, at least) to
> see if there are DRAM or timing problems.
This ^^^. It smells hardware-y. I'd be really surprised if JNOS could
do anything to hard-lock the system - typically even if something is
poking really low-level drivers the worst it will do is cause a kernel
panic. Okay, that *will* basically lock up your system, but it should
be more informative than "just plain catatonic".
Very nearly all "just locks up" problems I've run across have been down
to dying memory or overheating CPUs. In the past I've found that the
latter has been quite good for causing segfaults when the CPU is driven
hard, like compiling code or rendering video.
Gordon MM0YEQ
--
To unsubscribe from this list: send the line "unsubscribe linux-hams" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Robert Thoelen, KD1ZD
Check out http://www.rtcubed.org/kd1zd,
packet radio in CT
Station phone: (860) 698-0502
Station email kd1zd <at> rtcubed.org
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-11-19 14:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-18 17:26 Problems with system lockup kd1zd
2010-11-18 18:42 ` Dave Platt
2010-11-18 19:13 ` Gordon JC Pearce
2010-11-19 13:32 kd1zd
2010-11-19 14:50 ` Gordon JC Pearce
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.