linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA  patch for Intel Cougar Point DeviceIDs)
@ 2010-01-16 18:10 Robert Hancock
  2010-01-16 18:34 ` JMB363 false hotplug detections Krzysztof Halasa
  2010-01-17  2:07 ` JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA patch for Intel Cougar Point DeviceIDs) Tejun Heo
  0 siblings, 2 replies; 5+ messages in thread
From: Robert Hancock @ 2010-01-16 18:10 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Jeff Garzik, Seth Heasley, linux-ide, linux-kernel, Tejun Heo

On Sat, Jan 16, 2010 at 12:02 PM, Krzysztof Halasa <khc@pm.waw.pl> wrote:
> Robert Hancock <hancockrwd@gmail.com> writes:
>
>> Hmm.. From those test results I really suspect some kind of hardware
>> fault. Could be a defective motherboard - I don't know if that chip
>> needs any terminating resistors on the motherboard for the SATA signal
>> lines or something, if so, could be they weren't installed properly..
>
> Unfortunately I can't find a JMB363 datasheet on the net, but there is
> a certain mb (p965t-a) schematic available.
> It seems JMB363 doesn't need terminators on SATA RX/TX lines, there is
> capacitative coupling only (10 nF in each line).
>
> The port in question (SATA#2) on my mb (P45 Neo2) uses pins 56 (RXP)
> 57 (RXN) and 60 (TXN) 61 (TXP). No visible irregularity, the traces look
> like they should, go straigt to the capacitors, and then to 0R R-packs
> and to the connector. It looks exactly the same for both ports. There is
> no short circuit past the capacitors (from the connector side). I'd say
> quite low probability that there is something wrong with these signals.
>
> It seems the chip uses extra 12k resistors for SATA (p965t-a calls the
> pins SJ_REXT[12]), pin 44 for port#1 and 55 for port#2. Both look sane.
> I will check the suspected connections with the machine powered off
> later.
>
> The RX and TX trace pairs go next to each other for up to 10 mm, could
> that be a problem at these frequencies? If so it would show up on
> all/many such boards certainly? Can't find any report.
>
> OTOH other people have similar problems with other boards: e.g.
> ASUS P5KC: http://ubuntuforums.org/showthread.php?t=766217
>           https://bugs.launchpad.net/ubuntu/+source/linux/+bug/377633
>
> (unknown boards)
> https://archlinux-fr.org/doku.php?id=securisation:logcheck
> http://forum.ubuntu-fr.org/viewtopic.php?pid=2739616
> http://ubuntu-ky.ubuntuforums.org/showthread.php?p=7243061
> The last one claims:
> this started after an upgrade to ubuntu 9.04 and is stll here after re-installing
> ubuntu 8.10.
> this was fixed by re-installing ubuntu 8.10 only using the kernal,
> 2.6.27-7-generic.
> I don't know if JMB36x is involved in this case, and how reliable the
> info is.
>
> Investigating as time permits.

Well, it is possible there is some kind of flaw in the JMB363 chip
itself  that causes this problem. (Could be it happens in Windows too,
I don't think Windows drivers normally report these kinds of events
anywhere and if it never reached the point of actually deciding a
device was connected, you likely wouldn't notice.) I suppose we could
add a workaround in the driver to ignore hotplug events, but then real
hotplug events wouldn't get handled properly..

What revision does your JMB363 report in lspci? Mine shows:

03:00.0 SATA controller: JMicron Technologies, Inc. 20360/20363 Serial
ATA Controller (rev 03) (prog-if 01 [AHCI 1.0])

Tejun, any other ideas?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: JMB363 false hotplug detections
  2010-01-16 18:10 JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA patch for Intel Cougar Point DeviceIDs) Robert Hancock
@ 2010-01-16 18:34 ` Krzysztof Halasa
  2010-01-17  2:07 ` JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA patch for Intel Cougar Point DeviceIDs) Tejun Heo
  1 sibling, 0 replies; 5+ messages in thread
From: Krzysztof Halasa @ 2010-01-16 18:34 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Jeff Garzik, Seth Heasley, linux-ide, linux-kernel, Tejun Heo

Robert Hancock <hancockrwd@gmail.com> writes:

> What revision does your JMB363 report in lspci? Mine shows:
>
> 03:00.0 SATA controller: JMicron Technologies, Inc. 20360/20363 Serial
> ATA Controller (rev 03) (prog-if 01 [AHCI 1.0])

Same here.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA  patch for Intel Cougar Point DeviceIDs)
  2010-01-16 18:10 JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA patch for Intel Cougar Point DeviceIDs) Robert Hancock
  2010-01-16 18:34 ` JMB363 false hotplug detections Krzysztof Halasa
@ 2010-01-17  2:07 ` Tejun Heo
  2010-01-17 20:19   ` JMB363 false hotplug detections Krzysztof Halasa
  1 sibling, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2010-01-17  2:07 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Krzysztof Halasa, Jeff Garzik, Seth Heasley, linux-ide, linux-kernel

On 01/17/2010 03:10 AM, Robert Hancock wrote:
> 03:00.0 SATA controller: JMicron Technologies, Inc. 20360/20363 Serial
> ATA Controller (rev 03) (prog-if 01 [AHCI 1.0])
> 
> Tejun, any other ideas?

Can someone point me to the original thread?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: JMB363 false hotplug detections
  2010-01-17  2:07 ` JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA patch for Intel Cougar Point DeviceIDs) Tejun Heo
@ 2010-01-17 20:19   ` Krzysztof Halasa
  2010-01-19  8:57     ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Krzysztof Halasa @ 2010-01-17 20:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Robert Hancock, Jeff Garzik, Seth Heasley, linux-ide, linux-kernel

Tejun Heo <tj@kernel.org> writes:

> Can someone point me to the original thread?

Sure:

http://lkml.org/lkml/2010/1/13/342
http://lkml.org/lkml/2010/1/15/245

BTW setting the JMB363 mode in BIOS setup from IDE to AHCI or RAID
(thus enabling JMB363 BIOS) changes nothing.

The only weird thing is that some time ago the problems weren't there.
It could be genuine hardware problem. I have full kernel logs. Sometimes
the same kernel (build) is "good" at one time and "bad" at another.

I had booted the board (with JMB363 and the driver enabled) 57 times.
Out of these, there was no problems 16 times (date-hrs-result):

08-18-start-good
08-18-23:21-good
08-20-12:21-good
08-20-12:41-good
08-20-13:40-good
08-20-13:52-good
08-20-13:54-good
08-20-14:06-good
08-20-16:16-good
08-20-20:53-good
08-21-11:15-bad
08-22-12:10-good (uncertain, the kernel had other problems)
08-22-12:12-bad
08-22-22:36-bad
08-22-22:39-bad
08-22-22:40-bad
08-23-12:53-good
08-23-13:31-good
08-23-19:22-good
08-24-14:44-bad
08-25-12:29-bad
08-26-12:34-bad
08-27-12:43-bad
08-27-12:51-bad
08-27-18:26-bad
08-28-12:10-bad
08-29-11:38-bad
08-30-22:13-bad
08-31-12:50-bad
09-01-12:53-bad
09-01-17:47-good
09-02-12:42-bad
09-02-18:27-bad
09-03-11:56-bad
09-04-13:40-bad
09-04-22:23-bad
09-05-21:42-good
09-06-16:42-bad
09-07-14:13-bad
09-08-14:21-bad
09-09-13:47-bad
09-10-12:24-bad
09-10-23:33-bad
09-11-14:07-bad
09-12-01:43-bad
09-12-12:29-bad
09-12-23:11-bad
09-12-23:42-bad
09-13-01:11-bad
09-13-13:45-bad
09-13-17:53-bad
09-13-18:01-bad
09-13-19:13-bad
10-18-00:59-bad
10-18-17:07-bad
10-18-17:14-bad
10-18-17:19-bad

There are no significant kernel log differences between *good and *bad
(excluding the AHCI messages). Sometimes the exceptions were sporadic,
like in 09-01-12:53-bad case:

Sep  1 12:53:38 Machine booted
Sep  1 13:02:33 ata8: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
Sep  1 13:02:33 ata8: irq_stat 0x00000040, connection status changed
Sep  1 13:02:33 ata8: SError: { CommWake DevExch }
Sep  1 13:02:33 ata8: hard resetting link
Sep  1 13:02:34 ata8: SATA link down (SStatus 0 SControl 300)
Sep  1 13:02:34 ata8: EH complete
Sep  1 15:47:12 Machine rebooted

Perhaps I should really check these resistors around the JMB363 chip,
and maybe using a vacuum cleaner is a good idea? I think I will do.

It's certaing there was nothing connected do JMB363 SATA. I don't know
BIOS versions and CMOS (BIOS) configs.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: JMB363 false hotplug detections
  2010-01-17 20:19   ` JMB363 false hotplug detections Krzysztof Halasa
@ 2010-01-19  8:57     ` Tejun Heo
  0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2010-01-19  8:57 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Robert Hancock, Jeff Garzik, Seth Heasley, linux-ide, linux-kernel

Hello,

On 01/18/2010 05:19 AM, Krzysztof Halasa wrote:
> http://lkml.org/lkml/2010/1/13/342
> http://lkml.org/lkml/2010/1/15/245

Definitely looks like electrical problem to me.  The controller is
repeatedly reporting spurious hotplug events and the problem is not
universal to the controller either.  I've played with several
different jmb363s and they all worked just fine.  It would be
interesting to see whether the problem is reproducible on different
boards of the same model.

> BTW setting the JMB363 mode in BIOS setup from IDE to AHCI or RAID
> (thus enabling JMB363 BIOS) changes nothing.

That's expected.  The controller is always put into ahci mode during
intialization regardless of the mode programmed by the bios.

> The only weird thing is that some time ago the problems weren't there.
> It could be genuine hardware problem. I have full kernel logs. Sometimes
> the same kernel (build) is "good" at one time and "bad" at another.
> 
> I had booted the board (with JMB363 and the driver enabled) 57 times.
> Out of these, there was no problems 16 times (date-hrs-result):
...
> There are no significant kernel log differences between *good and *bad
> (excluding the AHCI messages). Sometimes the exceptions were sporadic,
> like in 09-01-12:53-bad case:
> 
> Sep  1 12:53:38 Machine booted
> Sep  1 13:02:33 ata8: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
> Sep  1 13:02:33 ata8: irq_stat 0x00000040, connection status changed
> Sep  1 13:02:33 ata8: SError: { CommWake DevExch }
> Sep  1 13:02:33 ata8: hard resetting link
> Sep  1 13:02:34 ata8: SATA link down (SStatus 0 SControl 300)
> Sep  1 13:02:34 ata8: EH complete
> Sep  1 15:47:12 Machine rebooted
> 
> Perhaps I should really check these resistors around the JMB363 chip,
> and maybe using a vacuum cleaner is a good idea? I think I will do.
> 
> It's certaing there was nothing connected do JMB363 SATA. I don't know
> BIOS versions and CMOS (BIOS) configs.

They don't matter.  Once the OS takes over, the controller is forced
into multi function ahci mode and the kernel version wouldn't have any
effect on it either.  That part of code hasn't changed for years now.
So, yeah, looks like a genuine hardware problem to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-01-19  8:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-16 18:10 JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA patch for Intel Cougar Point DeviceIDs) Robert Hancock
2010-01-16 18:34 ` JMB363 false hotplug detections Krzysztof Halasa
2010-01-17  2:07 ` JMB363 false hotplug detections (was: ahci: AHCI and RAID mode SATA patch for Intel Cougar Point DeviceIDs) Tejun Heo
2010-01-17 20:19   ` JMB363 false hotplug detections Krzysztof Halasa
2010-01-19  8:57     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).