All of lore.kernel.org
 help / color / mirror / Atom feed
* Serious bug in sata_sil module in 2.6.19.2?
@ 2007-02-01 10:24 Florian Effenberger
  2007-02-01 14:41 ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Effenberger @ 2007-02-01 10:24 UTC (permalink / raw)
  To: jgarzik; +Cc: linux-ide

Hi there,

I am not sure if the sata_sil-Module is the culprit, but I think so. We 
ran a 2.6.15.1 kernel for months without any problems. Now I tried to 
switch to 2.6.19.2, and every night when our backup routine runs, the 
machine locks up. No log entry, no automatic reboot 
(/proc/sys/kernel/panic is set to 30 [sec]).

The display looks like if there was a video error, lots of bad and 
flickering characters, but it is pretty hard to read the error message. 
I have an image and a video of that, if you need it. The only thing I 
could read is something with "ata". After the reboot, the RAID devices 
are out of sync and need to resync again.

Booting up 2.6.15.1 on the same machine cures the problem, everything 
works just fine, no lockups.

Is there any chance of debugging that problem, of finding the culprit?

Any help would be greatly appreciated!

Thanks
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Serious bug in sata_sil module in 2.6.19.2?
  2007-02-01 10:24 Serious bug in sata_sil module in 2.6.19.2? Florian Effenberger
@ 2007-02-01 14:41 ` Tejun Heo
  2007-02-01 23:39   ` Florian Effenberger
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2007-02-01 14:41 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: jgarzik, linux-ide

Florian Effenberger wrote:
> Hi there,
> 
> I am not sure if the sata_sil-Module is the culprit, but I think so. We
> ran a 2.6.15.1 kernel for months without any problems. Now I tried to
> switch to 2.6.19.2, and every night when our backup routine runs, the
> machine locks up. No log entry, no automatic reboot
> (/proc/sys/kernel/panic is set to 30 [sec]).
> 
> The display looks like if there was a video error, lots of bad and
> flickering characters, but it is pretty hard to read the error message.
> I have an image and a video of that, if you need it. The only thing I
> could read is something with "ata". After the reboot, the RAID devices
> are out of sync and need to resync again.
> 
> Booting up 2.6.15.1 on the same machine cures the problem, everything
> works just fine, no lockups.
> 
> Is there any chance of debugging that problem, of finding the culprit?
> 
> Any help would be greatly appreciated!

sata_sil hasn't seen as much change as other drivers and is one of the
more stable ones.  I'm definitely interested.  Please post the pics.  If
the video contains useful info, can you host it somewhere?

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Serious bug in sata_sil module in 2.6.19.2?
  2007-02-01 14:41 ` Tejun Heo
@ 2007-02-01 23:39   ` Florian Effenberger
  2007-02-06  6:30     ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Effenberger @ 2007-02-01 23:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, linux-ide

Hi there,

> sata_sil hasn't seen as much change as other drivers and is one of
> the more stable ones.  I'm definitely interested.  Please post the
> pics.  If the video contains useful info, can you host it somewhere?

I've uploaded the image at

http://img482.imageshack.us/img482/6677/img0149rh5.jpg

and the video at

http://video.google.de/videoplay?docid=-3785898339758585695

Can anyone detect something on the images?

(If you reply, please put me in Cc, I am not subscribed to the list. 
Thanks!)

Thanks!
Florian









^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Serious bug in sata_sil module in 2.6.19.2?
  2007-02-01 23:39   ` Florian Effenberger
@ 2007-02-06  6:30     ` Tejun Heo
  2007-02-06 10:45       ` Florian Effenberger
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2007-02-06  6:30 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: jgarzik, linux-ide

Florian Effenberger wrote:
> Hi there,
> 
>> sata_sil hasn't seen as much change as other drivers and is one of
>> the more stable ones.  I'm definitely interested.  Please post the
>> pics.  If the video contains useful info, can you host it somewhere?
> 
> I've uploaded the image at
> 
> http://img482.imageshack.us/img482/6677/img0149rh5.jpg
> 
> and the video at
> 
> http://video.google.de/videoplay?docid=-3785898339758585695
> 
> Can anyone detect something on the images?
> 
> (If you reply, please put me in Cc, I am not subscribed to the list.
> Thanks!)

That definitely looks like libata error messages but can't tell anything
other than that from it.  It could be cause of system hang and the weird
screen or just another symptom of another problem.

Is it possible for you to connect a serial console or configure
netconsole (Documentation/networking/netconsole.txt) such that the
messages are preserved after such hang occurs?  Also, please turn on
PRINTK_TIME (Kernel Hacking -> Show timing information on printks) so
that we can tell what happens when.  To make the info more useful, you
can log into the machine from another machine and run something like
"while true; do sleep 1; date; done" on it such that you can tel exactly
when the machine went down.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Serious bug in sata_sil module in 2.6.19.2?
  2007-02-06  6:30     ` Tejun Heo
@ 2007-02-06 10:45       ` Florian Effenberger
  2007-02-06 14:44         ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Effenberger @ 2007-02-06 10:45 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, linux-ide

Hi there,

> That definitely looks like libata error messages but can't tell anything
> other than that from it.  It could be cause of system hang and the weird
> screen or just another symptom of another problem.
> 
> Is it possible for you to connect a serial console or configure
> netconsole (Documentation/networking/netconsole.txt) such that the
> messages are preserved after such hang occurs?  Also, please turn on
> PRINTK_TIME (Kernel Hacking -> Show timing information on printks) so
> that we can tell what happens when.  To make the info more useful, you
> can log into the machine from another machine and run something like
> "while true; do sleep 1; date; done" on it such that you can tel exactly
> when the machine went down.

thanks a lot for your feedback. It seems we solved the problem: it was 
the power supplying unit! We change some stuff (memory, cleaned the 
machine and so on), but after changing the power supplying unit, 
everything worked fine. So I guess the new kernel just had a little bit 
more power consumption than the previous one.

Thanks a lot for your kind help, and sorry for the false alert! :-)

Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Serious bug in sata_sil module in 2.6.19.2?
  2007-02-06 10:45       ` Florian Effenberger
@ 2007-02-06 14:44         ` Tejun Heo
  2007-02-06 16:14           ` Florian Effenberger
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2007-02-06 14:44 UTC (permalink / raw)
  To: Florian Effenberger; +Cc: jgarzik, linux-ide

Florian Effenberger wrote:
> Hi there,
> 
>> That definitely looks like libata error messages but can't tell anything
>> other than that from it.  It could be cause of system hang and the weird
>> screen or just another symptom of another problem.
>>
>> Is it possible for you to connect a serial console or configure
>> netconsole (Documentation/networking/netconsole.txt) such that the
>> messages are preserved after such hang occurs?  Also, please turn on
>> PRINTK_TIME (Kernel Hacking -> Show timing information on printks) so
>> that we can tell what happens when.  To make the info more useful, you
>> can log into the machine from another machine and run something like
>> "while true; do sleep 1; date; done" on it such that you can tel exactly
>> when the machine went down.
> 
> thanks a lot for your feedback. It seems we solved the problem: it was
> the power supplying unit! We change some stuff (memory, cleaned the
> machine and so on), but after changing the power supplying unit,
> everything worked fine. So I guess the new kernel just had a little bit
> more power consumption than the previous one.
> 
> Thanks a lot for your kind help, and sorry for the false alert! :-)

Yeap, when power quality degrades, the first thing that breaks is SATA,
so that explains the error message.  There have been several SATA bug
reports which turned out to be PSU problems.  Good to have another
obvious data point on that.  :-)

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Serious bug in sata_sil module in 2.6.19.2?
  2007-02-06 14:44         ` Tejun Heo
@ 2007-02-06 16:14           ` Florian Effenberger
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Effenberger @ 2007-02-06 16:14 UTC (permalink / raw)
  To: Tejun Heo; +Cc: jgarzik, linux-ide

Hi there,

> Yeap, when power quality degrades, the first thing that breaks is SATA,
> so that explains the error message.  There have been several SATA bug
> reports which turned out to be PSU problems.  Good to have another
> obvious data point on that.  :-)

thanks for clarifying. Had that the first time, and now I'm much more 
wiser. :-)

Thanks for your fast support, much appreciated!

Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-02-06 16:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-01 10:24 Serious bug in sata_sil module in 2.6.19.2? Florian Effenberger
2007-02-01 14:41 ` Tejun Heo
2007-02-01 23:39   ` Florian Effenberger
2007-02-06  6:30     ` Tejun Heo
2007-02-06 10:45       ` Florian Effenberger
2007-02-06 14:44         ` Tejun Heo
2007-02-06 16:14           ` Florian Effenberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.