* Serious bug in sata_sil module in 2.6.19.2?
@ 2007-02-01 10:24 Florian Effenberger
2007-02-01 14:41 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Florian Effenberger @ 2007-02-01 10:24 UTC (permalink / raw)
To: jgarzik; +Cc: linux-ide
Hi there,
I am not sure if the sata_sil-Module is the culprit, but I think so. We
ran a 2.6.15.1 kernel for months without any problems. Now I tried to
switch to 2.6.19.2, and every night when our backup routine runs, the
machine locks up. No log entry, no automatic reboot
(/proc/sys/kernel/panic is set to 30 [sec]).
The display looks like if there was a video error, lots of bad and
flickering characters, but it is pretty hard to read the error message.
I have an image and a video of that, if you need it. The only thing I
could read is something with "ata". After the reboot, the RAID devices
are out of sync and need to resync again.
Booting up 2.6.15.1 on the same machine cures the problem, everything
works just fine, no lockups.
Is there any chance of debugging that problem, of finding the culprit?
Any help would be greatly appreciated!
Thanks
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Serious bug in sata_sil module in 2.6.19.2?
2007-02-01 10:24 Serious bug in sata_sil module in 2.6.19.2? Florian Effenberger
@ 2007-02-01 14:41 ` Tejun Heo
2007-02-01 23:39 ` Florian Effenberger
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2007-02-01 14:41 UTC (permalink / raw)
To: Florian Effenberger; +Cc: jgarzik, linux-ide
Florian Effenberger wrote:
> Hi there,
>
> I am not sure if the sata_sil-Module is the culprit, but I think so. We
> ran a 2.6.15.1 kernel for months without any problems. Now I tried to
> switch to 2.6.19.2, and every night when our backup routine runs, the
> machine locks up. No log entry, no automatic reboot
> (/proc/sys/kernel/panic is set to 30 [sec]).
>
> The display looks like if there was a video error, lots of bad and
> flickering characters, but it is pretty hard to read the error message.
> I have an image and a video of that, if you need it. The only thing I
> could read is something with "ata". After the reboot, the RAID devices
> are out of sync and need to resync again.
>
> Booting up 2.6.15.1 on the same machine cures the problem, everything
> works just fine, no lockups.
>
> Is there any chance of debugging that problem, of finding the culprit?
>
> Any help would be greatly appreciated!
sata_sil hasn't seen as much change as other drivers and is one of the
more stable ones. I'm definitely interested. Please post the pics. If
the video contains useful info, can you host it somewhere?
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Serious bug in sata_sil module in 2.6.19.2?
2007-02-01 14:41 ` Tejun Heo
@ 2007-02-01 23:39 ` Florian Effenberger
2007-02-06 6:30 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Florian Effenberger @ 2007-02-01 23:39 UTC (permalink / raw)
To: Tejun Heo; +Cc: jgarzik, linux-ide
Hi there,
> sata_sil hasn't seen as much change as other drivers and is one of
> the more stable ones. I'm definitely interested. Please post the
> pics. If the video contains useful info, can you host it somewhere?
I've uploaded the image at
http://img482.imageshack.us/img482/6677/img0149rh5.jpg
and the video at
http://video.google.de/videoplay?docid=-3785898339758585695
Can anyone detect something on the images?
(If you reply, please put me in Cc, I am not subscribed to the list.
Thanks!)
Thanks!
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Serious bug in sata_sil module in 2.6.19.2?
2007-02-01 23:39 ` Florian Effenberger
@ 2007-02-06 6:30 ` Tejun Heo
2007-02-06 10:45 ` Florian Effenberger
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2007-02-06 6:30 UTC (permalink / raw)
To: Florian Effenberger; +Cc: jgarzik, linux-ide
Florian Effenberger wrote:
> Hi there,
>
>> sata_sil hasn't seen as much change as other drivers and is one of
>> the more stable ones. I'm definitely interested. Please post the
>> pics. If the video contains useful info, can you host it somewhere?
>
> I've uploaded the image at
>
> http://img482.imageshack.us/img482/6677/img0149rh5.jpg
>
> and the video at
>
> http://video.google.de/videoplay?docid=-3785898339758585695
>
> Can anyone detect something on the images?
>
> (If you reply, please put me in Cc, I am not subscribed to the list.
> Thanks!)
That definitely looks like libata error messages but can't tell anything
other than that from it. It could be cause of system hang and the weird
screen or just another symptom of another problem.
Is it possible for you to connect a serial console or configure
netconsole (Documentation/networking/netconsole.txt) such that the
messages are preserved after such hang occurs? Also, please turn on
PRINTK_TIME (Kernel Hacking -> Show timing information on printks) so
that we can tell what happens when. To make the info more useful, you
can log into the machine from another machine and run something like
"while true; do sleep 1; date; done" on it such that you can tel exactly
when the machine went down.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Serious bug in sata_sil module in 2.6.19.2?
2007-02-06 6:30 ` Tejun Heo
@ 2007-02-06 10:45 ` Florian Effenberger
2007-02-06 14:44 ` Tejun Heo
0 siblings, 1 reply; 7+ messages in thread
From: Florian Effenberger @ 2007-02-06 10:45 UTC (permalink / raw)
To: Tejun Heo; +Cc: jgarzik, linux-ide
Hi there,
> That definitely looks like libata error messages but can't tell anything
> other than that from it. It could be cause of system hang and the weird
> screen or just another symptom of another problem.
>
> Is it possible for you to connect a serial console or configure
> netconsole (Documentation/networking/netconsole.txt) such that the
> messages are preserved after such hang occurs? Also, please turn on
> PRINTK_TIME (Kernel Hacking -> Show timing information on printks) so
> that we can tell what happens when. To make the info more useful, you
> can log into the machine from another machine and run something like
> "while true; do sleep 1; date; done" on it such that you can tel exactly
> when the machine went down.
thanks a lot for your feedback. It seems we solved the problem: it was
the power supplying unit! We change some stuff (memory, cleaned the
machine and so on), but after changing the power supplying unit,
everything worked fine. So I guess the new kernel just had a little bit
more power consumption than the previous one.
Thanks a lot for your kind help, and sorry for the false alert! :-)
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Serious bug in sata_sil module in 2.6.19.2?
2007-02-06 10:45 ` Florian Effenberger
@ 2007-02-06 14:44 ` Tejun Heo
2007-02-06 16:14 ` Florian Effenberger
0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2007-02-06 14:44 UTC (permalink / raw)
To: Florian Effenberger; +Cc: jgarzik, linux-ide
Florian Effenberger wrote:
> Hi there,
>
>> That definitely looks like libata error messages but can't tell anything
>> other than that from it. It could be cause of system hang and the weird
>> screen or just another symptom of another problem.
>>
>> Is it possible for you to connect a serial console or configure
>> netconsole (Documentation/networking/netconsole.txt) such that the
>> messages are preserved after such hang occurs? Also, please turn on
>> PRINTK_TIME (Kernel Hacking -> Show timing information on printks) so
>> that we can tell what happens when. To make the info more useful, you
>> can log into the machine from another machine and run something like
>> "while true; do sleep 1; date; done" on it such that you can tel exactly
>> when the machine went down.
>
> thanks a lot for your feedback. It seems we solved the problem: it was
> the power supplying unit! We change some stuff (memory, cleaned the
> machine and so on), but after changing the power supplying unit,
> everything worked fine. So I guess the new kernel just had a little bit
> more power consumption than the previous one.
>
> Thanks a lot for your kind help, and sorry for the false alert! :-)
Yeap, when power quality degrades, the first thing that breaks is SATA,
so that explains the error message. There have been several SATA bug
reports which turned out to be PSU problems. Good to have another
obvious data point on that. :-)
--
tejun
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Serious bug in sata_sil module in 2.6.19.2?
2007-02-06 14:44 ` Tejun Heo
@ 2007-02-06 16:14 ` Florian Effenberger
0 siblings, 0 replies; 7+ messages in thread
From: Florian Effenberger @ 2007-02-06 16:14 UTC (permalink / raw)
To: Tejun Heo; +Cc: jgarzik, linux-ide
Hi there,
> Yeap, when power quality degrades, the first thing that breaks is SATA,
> so that explains the error message. There have been several SATA bug
> reports which turned out to be PSU problems. Good to have another
> obvious data point on that. :-)
thanks for clarifying. Had that the first time, and now I'm much more
wiser. :-)
Thanks for your fast support, much appreciated!
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-02-06 16:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-01 10:24 Serious bug in sata_sil module in 2.6.19.2? Florian Effenberger
2007-02-01 14:41 ` Tejun Heo
2007-02-01 23:39 ` Florian Effenberger
2007-02-06 6:30 ` Tejun Heo
2007-02-06 10:45 ` Florian Effenberger
2007-02-06 14:44 ` Tejun Heo
2007-02-06 16:14 ` Florian Effenberger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.