All of lore.kernel.org
 help / color / mirror / Atom feed
* sata_via bus errors fixed?
@ 2011-01-24 16:06 Dave Howorth
  2011-01-24 17:04 ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Howorth @ 2011-01-24 16:06 UTC (permalink / raw)
  To: linux-ide

I'm not a kernel expert, so apologies if I misstep. I've recently
experienced many errors with a new disk and SATA controller, which have
apparently disappeared when a recent kernel is used. I'd welcome any
informed opinions about whether the system should be stable in this
configuration.

Short story first.

I have a fairly old system and one of the SATA 1 drives is failing so I
bought a replacement, which is SATA 2. The system didn't see it at all
so I rushed out and bought another drive (Xmas panic!) but it wasn't
seen either. Google told me it was a problem with the SATA chip on the
motherboard, so I borrowed a SATA adapter. It kind of worked but gave
lots of bus errors. I tried a couple of other adapters with similar
results. I tested the drives and they're perfect. I read about power
issues with SATA so I disconnected everything I could but that made no
difference. I tried with several cables, both data and power. So I
started looking for a new system.

Then I seem to have got lucky. I've been running openSUSE 11.2 and have
also tested Ubuntu 10.04 with similar results. Recently I tried Knoppix
6.4.3 and apparently everything worked perfectly. I still need to do
more testing but maybe my old system can live a while longer.

I'd be interested in any views people have about the prognosis.

OK, now here's the details:

The mobo is an MSI K8M Neo-V, which has two SATA 1.5 Gbps ports
controlled by a VIA VT6420 chip, whcih can't see 3 Gbps drives.

The failing drive is a Seagate 1.5 Gbps. The new drives are both Samsung
3 Gbps SATA drives; a 1 TB HD103SJ and a 320 GB. The smaller drive has a
jumper to force 1.5 Gbps speed, while the larger one uses a software
utility.

I borrowed a PCI adapter based on the Sil 3512 and I've bought one based
on the VIA VT6421A. Like all PCI SATA adapters, they're limited to 1.5 Gbps.

The output from lspci (with the Sil controller) looked like this:

00:00.0 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge
[K8T800/K8T890 South]
00:0b.0 Mass storage controller: Silicon Image, Inc. SiI 3512
[SATALink/SATARaid] Serial ATA Controller (rev 01)
00:0c.0 FireWire (IEEE 1394): Texas Instruments TSB12LV26 IEEE-1394
Controller (Link)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II]
(rev 78)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation NV44A [GeForce
6200] (rev a1)

Typical error messages were like this:

Jan  5 22:53:27 piglet kernel: [  157.040095] ata5: hard resetting link
Jan  5 22:53:27 piglet kernel: [  157.390039] ata5: SATA link up 1.5
Gbps (SStatus 113 SControl 310)
Jan  5 22:53:32 piglet kernel: [  162.390035] ata5: hard resetting link
Jan  5 22:53:33 piglet kernel: [  162.740037] ata5: SATA link up 1.5
Gbps (SStatus 113 SControl 310)
Jan  5 22:53:33 piglet kernel: [  162.780287] ata5.00: configured for
UDMA/100
Jan  5 22:53:33 piglet kernel: [  162.780294] ata5.00: device reported
invalid CHS sector 0
Jan  5 22:53:33 piglet kernel: [  162.780302] ata5: EH complete
Jan  5 22:54:03 piglet kernel: [  193.040089] ata5: hard resetting link
Jan  5 22:54:03 piglet kernel: [  193.390060] ata5: SATA link up 1.5
Gbps (SStatus 113 SControl 310)
Jan  5 22:54:03 piglet kernel: [  193.430287] ata5.00: configured for
UDMA/100
Jan  5 22:54:03 piglet kernel: [  193.430295] ata5.00: device reported
invalid CHS sector 0
Jan  5 22:54:03 piglet kernel: [  193.430308] ata5: EH complete
Jan  5 22:54:07 piglet kernel: [  197.042033] ata5.00: limiting speed to
UDMA/66:PIO4
Jan  5 22:54:07 piglet kernel: [  197.042070] ata5: hard resetting link
Jan  5 22:54:07 piglet kernel: [  197.390059] ata5: SATA link up 1.5
Gbps (SStatus 113 SControl 310)
Jan  5 22:54:07 piglet kernel: [  197.430288] ata5.00: configured for
UDMA/66
Jan  5 22:54:07 piglet kernel: [  197.430305] ata5: EH complete
Jan  5 22:54:08 piglet kernel: [  197.821413] ata5.00: configured for
UDMA/66
Jan  5 22:54:08 piglet kernel: [  197.821437] ata5: EH complete
Jan  5 22:54:38 piglet kernel: [  228.040099] ata5: hard resetting link
Jan  5 22:54:38 piglet kernel: [  228.390046] ata5: SATA link up 1.5
Gbps (SStatus 113 SControl 310)
Jan  5 22:54:38 piglet kernel: [  228.430286] ata5.00: configured for
UDMA/66
Jan  5 22:54:38 piglet kernel: [  228.430309] ata5: EH complete

You can see that it steadily reduces the bus speed.


With the new VIA adapter, lspci shows:

00:00.0 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8M800 Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge
[K8T800/K8T890 South]
00:0b.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID
Controller (rev 50)
00:0c.0 FireWire (IEEE 1394): Texas Instruments TSB12LV26 IEEE-1394
Controller (Link)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc.
VT8233/A/8235/8237 AC97 Audio Controller (rev 60)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II]
(rev 78)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation NV44A [GeForce
6200] (rev a1)


and error messages looked like this:

Jan 12 20:51:18 piglet kernel: [  109.441492] ata2.00: exception Emask
0x12 SAct 0x0 SErr 0x1000500 action 0x6
Jan 12 20:51:18 piglet kernel: [  109.441517] ata2.00: BMDMA stat 0x5
Jan 12 20:51:18 piglet kernel: [  109.441525] ata2: SError: {
UnrecovData Proto TrStaTrns }
Jan 12 20:51:18 piglet kernel: [  109.441539] ata2.00: cmd
c8/00:f0:58:05:57/00:00:00:00:00/e1 tag 0 dma 122880 in
Jan 12 20:51:18 piglet kernel: [  109.441541]          res
51/84:48:00:00:00/84:58:00:00:00/e0 Emask 0x12 (ATA bus error)
Jan 12 20:51:18 piglet kernel: [  109.441555] ata2.00: status: { DRDY ERR }
Jan 12 20:51:18 piglet kernel: [  109.441561] ata2.00: error: { ICRC ABRT }
Jan 12 20:51:18 piglet kernel: [  109.441575] ata2: hard resetting link
Jan 12 20:51:18 piglet kernel: [  109.746050] ata2: SATA link up 1.5
Gbps (SStatus 113 SControl 310)
Jan 12 20:51:18 piglet kernel: [  109.784313] ata2.00: configured for
UDMA/33
Jan 12 20:51:18 piglet kernel: [  109.784337] ata2: EH complete

The errors didn't seem to cause any data corruption.

Oh and the kernel versions are:
openSUSE 11.2  2.6.31
ubuntu 10.04   2.6.32
knoppix 6.4.3  2.6.36


Looking at the kernel changelogs I see a 'magic patch' from Joseph Chan
that was applied between .32 and .36. It is described as improving
behaviour with WD drives while mine are Samsung. But looking at the
kernel bugzilla, it seemed to my tyro eyes that the symptoms are similar.

So I'm curious whether:
(1) My case is support for wider usefulness of the 'magic patch', or
(2) there was some other kernel change that explains the improved
behaviour on my system, or
(3) I've misunderstood the evidence and there's something else going on.

Regards, Dave


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-24 16:06 sata_via bus errors fixed? Dave Howorth
@ 2011-01-24 17:04 ` Tejun Heo
  2011-01-24 17:23   ` Dave Howorth
  2011-01-25 10:35   ` Dave Howorth
  0 siblings, 2 replies; 15+ messages in thread
From: Tejun Heo @ 2011-01-24 17:04 UTC (permalink / raw)
  To: Dave Howorth; +Cc: linux-ide

Hello,

On Mon, Jan 24, 2011 at 04:06:31PM +0000, Dave Howorth wrote:
> Jan  5 22:54:38 piglet kernel: [  228.040099] ata5: hard resetting link
> Jan  5 22:54:38 piglet kernel: [  228.390046] ata5: SATA link up 1.5
> Gbps (SStatus 113 SControl 310)
> Jan  5 22:54:38 piglet kernel: [  228.430286] ata5.00: configured for
> UDMA/66
> Jan  5 22:54:38 piglet kernel: [  228.430309] ata5: EH complete
> 
> You can see that it steadily reduces the bus speed.

Hmmm... there is no message which shows why EH kicked in.  Weird.  Can
you please post the output of dmesg instead after those failures?

> With the new VIA adapter, lspci shows:
> Jan 12 20:51:18 piglet kernel: [  109.441492] ata2.00: exception Emask
> 0x12 SAct 0x0 SErr 0x1000500 action 0x6
> Jan 12 20:51:18 piglet kernel: [  109.441517] ata2.00: BMDMA stat 0x5
> Jan 12 20:51:18 piglet kernel: [  109.441525] ata2: SError: {
> UnrecovData Proto TrStaTrns }
> Jan 12 20:51:18 piglet kernel: [  109.441539] ata2.00: cmd
> c8/00:f0:58:05:57/00:00:00:00:00/e1 tag 0 dma 122880 in
> Jan 12 20:51:18 piglet kernel: [  109.441541]          res
> 51/84:48:00:00:00/84:58:00:00:00/e0 Emask 0x12 (ATA bus error)
> Jan 12 20:51:18 piglet kernel: [  109.441555] ata2.00: status: { DRDY ERR }
> Jan 12 20:51:18 piglet kernel: [  109.441561] ata2.00: error: { ICRC ABRT }
> Jan 12 20:51:18 piglet kernel: [  109.441575] ata2: hard resetting link
> Jan 12 20:51:18 piglet kernel: [  109.746050] ata2: SATA link up 1.5
> Gbps (SStatus 113 SControl 310)
> Jan 12 20:51:18 piglet kernel: [  109.784313] ata2.00: configured for
> UDMA/33
> Jan 12 20:51:18 piglet kernel: [  109.784337] ata2: EH complete
> 
> The errors didn't seem to cause any data corruption.
> 
> Oh and the kernel versions are:
> openSUSE 11.2  2.6.31
> ubuntu 10.04   2.6.32
> knoppix 6.4.3  2.6.36
> 
> Looking at the kernel changelogs I see a 'magic patch' from Joseph Chan
> that was applied between .32 and .36. It is described as improving
> behaviour with WD drives while mine are Samsung. But looking at the
> kernel bugzilla, it seemed to my tyro eyes that the symptoms are similar.
> 
> So I'm curious whether:
> (1) My case is support for wider usefulness of the 'magic patch', or

I doubt it.  The problem is via specific and you seem to be
experiencing similar problem on the sil controller too.

> (2) there was some other kernel change that explains the improved
>     behaviour on my system, or

AFAIK, nope.

> (3) I've misunderstood the evidence and there's something else going on.

It seems like the hardware definitely is flaky.  SATA is one of the
first things which malfunction when the system has has interference
issues.  I have no idea why the new kernel makes it happier tho.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-24 17:04 ` Tejun Heo
@ 2011-01-24 17:23   ` Dave Howorth
  2011-01-25 10:35   ` Dave Howorth
  1 sibling, 0 replies; 15+ messages in thread
From: Dave Howorth @ 2011-01-24 17:23 UTC (permalink / raw)
  To: linux-ide

Tejun Heo wrote:
> Hmmm... there is no message which shows why EH kicked in.  Weird.  Can
> you please post the output of dmesg instead after those failures?

Thanks for the fast response. Sorry. I posted too short a sample. I'll
see what I can do this evening.

Cheers, Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-24 17:04 ` Tejun Heo
  2011-01-24 17:23   ` Dave Howorth
@ 2011-01-25 10:35   ` Dave Howorth
  2011-01-25 10:50     ` Tejun Heo
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Howorth @ 2011-01-25 10:35 UTC (permalink / raw)
  Cc: linux-ide

Tejun Heo wrote:
> Hmmm... there is no message which shows why EH kicked in.  Weird.  Can
> you please post the output of dmesg instead after those failures?

Hello again.

I gave back the borrowed Sil 3512 adapter so I can't provide more data
for that. I never ran that with Knoppix so I don't know whether it would
be 'mended' or not.

Last night I booted with the VIA 6421A adapter and the 1 TB drive and
recorded various data. They're a bit big to put in an email so I've put
them on pastebin; I hope that's OK. They're at:

dmesg               http://pastebin.com/FF3ZtcXy
lspci               http://pastebin.com/u5y8XrWm
smartctl-a-dev-sda  http://pastebin.com/m5YNRw2t
var-log-messages    http://pastebin.com/mVrkyTF2

>> So I'm curious whether:
>> (1) My case is support for wider usefulness of the 'magic patch', or
> 
> I doubt it.  The problem is via specific and you seem to be
> experiencing similar problem on the sil controller too.

The sil problems were similar in effect, in that they both produced a
flaky system, but the error messages were different in detail. I'm not
qualified to tell whether the differences were significant or not.

Unfortunately I've returned the borrowed Sil card and the priority for
me now is to see whether I can trust my system using the VIA card. I
could probably borrow the Sil card again if it would be helpful.

> It seems like the hardware definitely is flaky.  SATA is one of the
> first things which malfunction when the system has has interference
> issues.  I have no idea why the new kernel makes it happier tho.

Yes, a hardware glitch does seem the strongest candidate. But the newish
kernel does seem to make it completely solid AFAICT. Cash-wise my
alternative is a new mobo, cpu, ram & psu so there's a bit of an
incentive to persevere. But mostly I guess it's just a perverse desire
to understand what's going on that is driving me.

I suppose I could try various kernel versions to see if it's possible to
isolate one change that causes the improved behaviour. That would be a
bit of a learning curve and I'm not sure how much time it would need.

Cheers, Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-25 10:35   ` Dave Howorth
@ 2011-01-25 10:50     ` Tejun Heo
  2011-01-26 10:00       ` Dave Howorth
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2011-01-25 10:50 UTC (permalink / raw)
  To: Dave Howorth; +Cc: linux-ide

Hello,

(please always reply-to-all)

On Tue, Jan 25, 2011 at 10:35:29AM +0000, Dave Howorth wrote:
> dmesg               http://pastebin.com/FF3ZtcXy
> lspci               http://pastebin.com/u5y8XrWm
> smartctl-a-dev-sda  http://pastebin.com/m5YNRw2t
> var-log-messages    http://pastebin.com/mVrkyTF2

Hmmm....

> Yes, a hardware glitch does seem the strongest candidate. But the newish
> kernel does seem to make it completely solid AFAICT. Cash-wise my
> alternative is a new mobo, cpu, ram & psu so there's a bit of an
> incentive to persevere. But mostly I guess it's just a perverse desire
> to understand what's going on that is driving me.

Oh, I love bug reporters with such desire.  I love you. :-)

> I suppose I could try various kernel versions to see if it's possible to
> isolate one change that causes the improved behaviour. That would be a
> bit of a learning curve and I'm not sure how much time it would need.

Yes, that definitely would be a good idea.  The first thing I would
try is disabling the VIA FIFO workaround completely and see whether
that makes any difference.  ie. Comment out the last section of
svia_configure() in the latest kernel and see if the problems occur
again.

If that's the case, it could be that you were seeing two separate
problems on via and sil.  Digging down what sil was complaining about
would be interesting in that case.

If that's not the case, it could be something which seems unrelated -
some ACPI or cpufreq change or whatnot.  The best way to find out
would be bisection.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-25 10:50     ` Tejun Heo
@ 2011-01-26 10:00       ` Dave Howorth
  2011-01-26 10:11         ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Howorth @ 2011-01-26 10:00 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo wrote:
> (please always reply-to-all)

OK :)

> Yes, that definitely would be a good idea.  The first thing I would
> try is disabling the VIA FIFO workaround completely and see whether
> that makes any difference.

Just a quick progress note. I'm most familiar with Suse and not familiar
with kernel builds so last night I installed the pre-built
Kernel:/HEAD/11.2 kernel from the opensuse build service. It's 2.6.37.

It appeared to run without problem* so it seems that whatever fixed my
issue was at least not specific to the Knoppix kernel. FWIW, I've put
the dmesg at:

http://pastebin.com/bWUNr8Ve

It was late so all I did was copy a 1.7 GB file to and from the new disk
on the VT6241A controller. No errors were logged. Incidentally, what's
the best way of testing for this type of reliability?

Tonight I plan to compile that kernel, which should confirm that I
understand how to do it, and then I'll disable the VIA FIFO workaround
as you suggest and see what happens.

Cheers, Dave

* PS There was one oddity, which was that Evolution crashed at around
the same moment as I was mounting a logical volume from the new disk. I
didn't investigate and it restarted OK.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-26 10:00       ` Dave Howorth
@ 2011-01-26 10:11         ` Tejun Heo
  2011-01-31 10:50           ` Dave Howorth
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2011-01-26 10:11 UTC (permalink / raw)
  To: Dave Howorth; +Cc: linux-ide

Hello,

On Wed, Jan 26, 2011 at 10:00:57AM +0000, Dave Howorth wrote:
> Just a quick progress note. I'm most familiar with Suse and not familiar
> with kernel builds so last night I installed the pre-built
> Kernel:/HEAD/11.2 kernel from the opensuse build service. It's 2.6.37.

There are plenty of howtos on building kernel.  It takes some time to
get used to but once you get a hang of it there's nothing difficult
about it.  It usually is best to figure out small kernel configuration
which contain everything necessary to bring up the root filesystem and
other important hardware so that you don't have to worry about initrd
and modules.

> It was late so all I did was copy a 1.7 GB file to and from the new disk
> on the VT6241A controller. No errors were logged. Incidentally, what's
> the best way of testing for this type of reliability?

The best way would be finding out a workload which doesn't take too
much time but reliably triggers the issue on a problematic kernel.

> Tonight I plan to compile that kernel, which should confirm that I
> understand how to do it, and then I'll disable the VIA FIFO workaround
> as you suggest and see what happens.

Have fun!

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-26 10:11         ` Tejun Heo
@ 2011-01-31 10:50           ` Dave Howorth
  2011-01-31 10:53             ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Howorth @ 2011-01-31 10:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo wrote:
> There are plenty of howtos on building kernel.  It takes some time to
> get used to but once you get a hang of it there's nothing difficult
> about it.

Hmm, I've finally managed to get the compilation to work often enough to
run my tests but I don't think I'll be taking up kernel compiling as a
hobby :(

> The best way would be finding out a workload which doesn't take too
> much time but reliably triggers the issue on a problematic kernel.

That would be booting then :) The dmesg reliably has or does not have
the error messages.

>> Tonight I plan to compile that kernel, which should confirm that I
>> understand how to do it, and then I'll disable the VIA FIFO workaround
>> as you suggest and see what happens.

OK. So the results are now in:

(1) stock SUSE HEAD/11.2 (2.6.37) - works fine with no problem
(2) SUSE 2.6.37 with the 'magic patch' commented out - shows the problem
(3) stock updated SUSE 11.2 (2.6.31.14) - shows the problem
(4) stock SUSE 11.2 with the 'magic patch' added (2.6.31.5) - works fine
with no problem

I think that's fairly indicative that the 'magic patch' is able to cure
my problem with a VT 6241A controller and a Samsung HD103SJ drive.

One thing I still need to investigate is why Ubuntu 10.04 was showing
the problem, but perhaps I never got as far as updating the kernel after
installing it. (AFAIK as installed it is 2.6.31, whilst the current
update is 2.6.32-28 which I think should have the patch)

Thanks for your help.
Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-31 10:50           ` Dave Howorth
@ 2011-01-31 10:53             ` Tejun Heo
  2011-01-31 12:19               ` Dave Howorth
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2011-01-31 10:53 UTC (permalink / raw)
  To: Dave Howorth; +Cc: linux-ide

Hello,

On Mon, Jan 31, 2011 at 10:50:03AM +0000, Dave Howorth wrote:
> (1) stock SUSE HEAD/11.2 (2.6.37) - works fine with no problem
> (2) SUSE 2.6.37 with the 'magic patch' commented out - shows the problem
> (3) stock updated SUSE 11.2 (2.6.31.14) - shows the problem
> (4) stock SUSE 11.2 with the 'magic patch' added (2.6.31.5) - works fine
> with no problem
> 
> I think that's fairly indicative that the 'magic patch' is able to cure
> my problem with a VT 6241A controller and a Samsung HD103SJ drive.

I see, so the via issue was the same FIFO configuration problem.

> One thing I still need to investigate is why Ubuntu 10.04 was showing
> the problem, but perhaps I never got as far as updating the kernel after
> installing it. (AFAIK as installed it is 2.6.31, whilst the current
> update is 2.6.32-28 which I think should have the patch)

Most likely.  So, that leaves the problem sil was seeing.  Are you
interested in digging that down too?

Thank you.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-31 10:53             ` Tejun Heo
@ 2011-01-31 12:19               ` Dave Howorth
  2011-01-31 13:24                 ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Howorth @ 2011-01-31 12:19 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo wrote:
>> One thing I still need to investigate is why Ubuntu 10.04 was showing
>> the problem, but perhaps I never got as far as updating the kernel after
>> installing it. (AFAIK as installed it is 2.6.31, whilst the current
>> update is 2.6.32-28 which I think should have the patch)
> 
> Most likely.

I've just been googling and peeking and it seems there may be another
possibility but I'm much less familiar with Ubuntu so I could easily be
misinterpreting something.

I have an Ubuntu box at work and I just installed the
linux-source-2.6.32-28.55 package on it. When I look at sata_via.c I see
that there's a different (earlier?) version of the patch there than what
is in 2.6.37 (the test is pdev->device == 0x3249 versus one based on
board_id)

So it's conceivable that I did update my lucid installation at home but
that this version of the patch doesn't fix the problem. I'll check
exactly what's what tonight.

> So, that leaves the problem sil was seeing.  Are you
> interested in digging that down too?

Well I don't mind testing that card with the kernels I've now got,
assuming I can borrow the card again. I'm a lot less enthusiastic about
compiling more new versions though.

Firstly, I've got to finish sorting out my machine - my backup system
has stopped working for example and I've still got a bunch of data to
migrate/integrate from the old disk. And I need to upgrade SUSE to 11.3
to see if that fixes this problem without introducing any new ones.

Secondly, kernel compilation in my limited experience has been a big
hassle. Every time I change anything in sata_via.c it seems to want to
recompile absolutely everything and that takes over two hours. I thought
make was supposed to do minimal recompilation? That plus a bug in the
script, the unexpectedly large amount of space required and some finger
trouble made it a less than smooth process.

Since we don't yet know whether a recent kernel fixes my sil problem and
if it does we don't have any idea of a particular patch, that suggests
the strategy would be bisection, which probably involves more compiles
than I'm up for.

But first steps first. I'll see if I can test it with my existing kernels.

Cheers, Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-31 12:19               ` Dave Howorth
@ 2011-01-31 13:24                 ` Tejun Heo
  2011-02-04 12:09                   ` Dave Howorth
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2011-01-31 13:24 UTC (permalink / raw)
  To: Dave Howorth; +Cc: linux-ide

Hello,

On Mon, Jan 31, 2011 at 12:19:31PM +0000, Dave Howorth wrote:
> I have an Ubuntu box at work and I just installed the
> linux-source-2.6.32-28.55 package on it. When I look at sata_via.c I see
> that there's a different (earlier?) version of the patch there than what
> is in 2.6.37 (the test is pdev->device == 0x3249 versus one based on
> board_id)

Yes, originally the FIFO fix was applied more narrowly.  Later it got
discovered that the fix was necessary for all 6421's and updated.

> So it's conceivable that I did update my lucid installation at home but
> that this version of the patch doesn't fix the problem. I'll check
> exactly what's what tonight.

Yeah, if that's the case and if it's still not updated in ubuntu,
opening a bug report against ubuntu would be a good idea.

> > So, that leaves the problem sil was seeing.  Are you
> > interested in digging that down too?
> 
> Well I don't mind testing that card with the kernels I've now got,
> assuming I can borrow the card again. I'm a lot less enthusiastic about
> compiling more new versions though.
> 
> Firstly, I've got to finish sorting out my machine - my backup system
> has stopped working for example and I've still got a bunch of data to
> migrate/integrate from the old disk. And I need to upgrade SUSE to 11.3
> to see if that fixes this problem without introducing any new ones.

Okay.

> Secondly, kernel compilation in my limited experience has been a big
> hassle. Every time I change anything in sata_via.c it seems to want to
> recompile absolutely everything and that takes over two hours. I thought
> make was supposed to do minimal recompilation? That plus a bug in the
> script, the unexpectedly large amount of space required and some finger
> trouble made it a less than smooth process.

It really shouldn't be like that.  If you're using distro build
scripts, it might behave like that but if you're building directly
from vanilla tarball and if you just modify sata_via, it will just
recompile sata_via and relink the kernel which will probably take
something like a couple minutes.

> Since we don't yet know whether a recent kernel fixes my sil problem and
> if it does we don't have any idea of a particular patch, that suggests
> the strategy would be bisection, which probably involves more compiles
> than I'm up for.

If sil shows the problem, I'll provide debug patches so that you'll
only need to build libata and sata_via.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-01-31 13:24                 ` Tejun Heo
@ 2011-02-04 12:09                   ` Dave Howorth
  2011-02-08 11:16                     ` Dave Howorth
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Howorth @ 2011-02-04 12:09 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo wrote:
>> Since we don't yet know whether a recent kernel fixes my sil problem and
>> if it does we don't have any idea of a particular patch, that suggests
>> the strategy would be bisection, which probably involves more compiles
>> than I'm up for.
> 
> If sil shows the problem, I'll provide debug patches so that you'll
> only need to build libata and sata_via.

OK, I fixed my backup system and I feel a bit more relaxed now, so I did
a quick test. I borrowed the Sil 3512-based card again and ran it with
Knoppix and also with the SUSE 2.6.37 kernel. Neither of them indicated
any errors with the drive so I suppose that some change between 2.6.32
and 2.6.37 has already fixed whatever my problem was with that
controller. One less thing to worry about.

I haven't got around to doing any of the other things I mentioned yet :(

Cheers, Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-02-04 12:09                   ` Dave Howorth
@ 2011-02-08 11:16                     ` Dave Howorth
  2011-02-09  9:43                       ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Howorth @ 2011-02-08 11:16 UTC (permalink / raw)
  To: Tejun Heo, linux-ide

Dave Howorth wrote:
> Tejun Heo wrote:
>>> Since we don't yet know whether a recent kernel fixes my sil problem and
>>> if it does we don't have any idea of a particular patch, that suggests
>>> the strategy would be bisection, which probably involves more compiles
>>> than I'm up for.
>> If sil shows the problem, I'll provide debug patches so that you'll
>> only need to build libata and sata_via.
> 
> OK, I fixed my backup system and I feel a bit more relaxed now, so I did
> a quick test. I borrowed the Sil 3512-based card again and ran it with
> Knoppix and also with the SUSE 2.6.37 kernel. Neither of them indicated
> any errors with the drive so I suppose that some change between 2.6.32
> and 2.6.37 has already fixed whatever my problem was with that
> controller. One less thing to worry about.
> 
> I haven't got around to doing any of the other things I mentioned yet :(

Hmm, Things are not quite so clear cut. Yesterday, with the Sil card
still in the system, I happened to reboot the original unpatched 2.6.32
kernel and unfortunately everything worked perfectly!

So there must be some other factor involved. Most likely is that after
running the main batch of tests I have removed an old PATA drive from
the system, which was connected to the mainboard. It's also
theoretically possible of course that every other time I tested the
system I systematically misaligned a SATA cable when running the old
kernel or somesuch. Anyway, I no longer have a failing test case for the
Sil card.

Since I need to plug my new Via card back in anyway, I'll rerun the old
kernel with that just to see whether it still shows the problem when the
PATA drive is not in the system.

Cheers, Dave

PS I guess since I haven't heard from you recently, it might be
appropriate to wish you Happy New Year!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-02-08 11:16                     ` Dave Howorth
@ 2011-02-09  9:43                       ` Tejun Heo
  2011-02-09 10:04                         ` Dave Howorth
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2011-02-09  9:43 UTC (permalink / raw)
  To: Dave Howorth; +Cc: linux-ide

Hello,

On Tue, Feb 08, 2011 at 11:16:38AM +0000, Dave Howorth wrote:
> Hmm, Things are not quite so clear cut. Yesterday, with the Sil card
> still in the system, I happened to reboot the original unpatched 2.6.32
> kernel and unfortunately everything worked perfectly!
> 
> So there must be some other factor involved. Most likely is that after
> running the main batch of tests I have removed an old PATA drive from
> the system, which was connected to the mainboard. It's also
> theoretically possible of course that every other time I tested the
> system I systematically misaligned a SATA cable when running the old
> kernel or somesuch. Anyway, I no longer have a failing test case for the
> Sil card.

Hmmm... interesting.

> Since I need to plug my new Via card back in anyway, I'll rerun the old
> kernel with that just to see whether it still shows the problem when the
> PATA drive is not in the system.

Alright, let us know if you encounter problems.

> PS I guess since I haven't heard from you recently, it might be
> appropriate to wish you Happy New Year!

Happy new year!

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: sata_via bus errors fixed?
  2011-02-09  9:43                       ` Tejun Heo
@ 2011-02-09 10:04                         ` Dave Howorth
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Howorth @ 2011-02-09 10:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-ide

Tejun Heo wrote:
>> Since I need to plug my new Via card back in anyway, I'll rerun the old
>> kernel with that just to see whether it still shows the problem when the
>> PATA drive is not in the system.
> 
> Alright, let us know if you encounter problems.

I returned the borrowed Sil card and plugged my new Via card back in
again and now see no problems even with the old kernel. That's with no
PATA disk in the system any longer, So it looks like having a PATA drive
plugged in to the motherboard is a precondition for the bus errors on
the adapter cards.

Too many variables, my head hurts. But at least it seems there are no
outstanding problems from this scenario going forward.

Cheers, Dave

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-09 10:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-24 16:06 sata_via bus errors fixed? Dave Howorth
2011-01-24 17:04 ` Tejun Heo
2011-01-24 17:23   ` Dave Howorth
2011-01-25 10:35   ` Dave Howorth
2011-01-25 10:50     ` Tejun Heo
2011-01-26 10:00       ` Dave Howorth
2011-01-26 10:11         ` Tejun Heo
2011-01-31 10:50           ` Dave Howorth
2011-01-31 10:53             ` Tejun Heo
2011-01-31 12:19               ` Dave Howorth
2011-01-31 13:24                 ` Tejun Heo
2011-02-04 12:09                   ` Dave Howorth
2011-02-08 11:16                     ` Dave Howorth
2011-02-09  9:43                       ` Tejun Heo
2011-02-09 10:04                         ` Dave Howorth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.