All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
@ 2009-10-21 18:55 Mikulas Patocka
  2009-10-21 19:34 ` Mikael Pettersson
  2009-10-21 19:39 ` Bartlomiej Zolnierkiewicz
  0 siblings, 2 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-10-21 18:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-ide

Hi

This patch fixes a data corruption when SSD is connected to Ultra 5.

Mikulas

--

Serialize CMD643 and CMD646 to fix a hardware bug with SSD

CMD646 corrupts data on concurrent transfers on both channels when IDE SSD is
connected to one of the channels.

Setup that demonstrates this hardware bug: Ultra 5, onboard CMD646, rev 3.
/dev/hda is 8GB Seagate ST38410A in MWDMA2
/dev/hdd is 32GB SSD SiliconHardDisk in MWDMA2

- When reading /dev/hdd (for example with dd or fsck), reads from /dev/hda
  are corrupted, there are twiddled single bits 1->0 and some full 32-bit
  words corrupted, sometimes commands fail (which switches /dev/hda to
  PIO mode but the corruptions happen even in PIO).
- Reads from /dev/hdd don't seem to be corrupted (i.e. fsck passes fine).
- When I connected normal rotating harddisk to /dev/hdd, there was no
  corruption, so the corruption is something specific to SSD.
- I tried the same setup on a PCI card with CMD649 and saw no corruption.

This patch serializes the operation for CMD646 and 643 (I didn't test
CMD643 but it may have the same hw bug too because it's earlier design).
CMD649 is good. I don't know anything about CMD 648.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/ide/cmd64x.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6.31-fast/drivers/ide/cmd64x.c
===================================================================
--- linux-2.6.31-fast.orig/drivers/ide/cmd64x.c	2009-10-21 06:08:42.000000000 +0200
+++ linux-2.6.31-fast/drivers/ide/cmd64x.c	2009-10-21 06:09:09.000000000 +0200
@@ -379,7 +379,8 @@ static const struct ide_port_info cmd64x
 		.enablebits	= {{0x00,0x00,0x00}, {0x51,0x08,0x08}},
 		.port_ops	= &cmd64x_port_ops,
 		.host_flags	= IDE_HFLAG_CLEAR_SIMPLEX |
-				  IDE_HFLAG_ABUSE_PREFETCH,
+				  IDE_HFLAG_ABUSE_PREFETCH |
+				  IDE_HFLAG_SERIALIZE,
 		.pio_mask	= ATA_PIO5,
 		.mwdma_mask	= ATA_MWDMA2,
 		.udma_mask	= 0x00, /* no udma */
@@ -389,7 +390,8 @@ static const struct ide_port_info cmd64x
 		.init_chipset	= init_chipset_cmd64x,
 		.enablebits	= {{0x51,0x04,0x04}, {0x51,0x08,0x08}},
 		.port_ops	= &cmd648_port_ops,
-		.host_flags	= IDE_HFLAG_ABUSE_PREFETCH,
+		.host_flags	= IDE_HFLAG_ABUSE_PREFETCH |
+				  IDE_HFLAG_SERIALIZE,
 		.pio_mask	= ATA_PIO5,
 		.mwdma_mask	= ATA_MWDMA2,
 		.udma_mask	= ATA_UDMA2,

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-21 18:55 [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD Mikulas Patocka
@ 2009-10-21 19:34 ` Mikael Pettersson
  2009-10-21 23:01   ` Mikulas Patocka
  2009-10-21 19:39 ` Bartlomiej Zolnierkiewicz
  1 sibling, 1 reply; 63+ messages in thread
From: Mikael Pettersson @ 2009-10-21 19:34 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David S. Miller, linux-ide

Mikulas Patocka writes:
 > Hi
 > 
 > This patch fixes a data corruption when SSD is connected to Ultra 5.
 > 
 > Mikulas
 > 
 > --
 > 
 > Serialize CMD643 and CMD646 to fix a hardware bug with SSD
 > 
 > CMD646 corrupts data on concurrent transfers on both channels when IDE SSD is
 > connected to one of the channels.
 > 
 > Setup that demonstrates this hardware bug: Ultra 5, onboard CMD646, rev 3.
 > /dev/hda is 8GB Seagate ST38410A in MWDMA2
 > /dev/hdd is 32GB SSD SiliconHardDisk in MWDMA2
 > 
 > - When reading /dev/hdd (for example with dd or fsck), reads from /dev/hda
 >   are corrupted, there are twiddled single bits 1->0 and some full 32-bit
 >   words corrupted, sometimes commands fail (which switches /dev/hda to
 >   PIO mode but the corruptions happen even in PIO).
 > - Reads from /dev/hdd don't seem to be corrupted (i.e. fsck passes fine).
 > - When I connected normal rotating harddisk to /dev/hdd, there was no
 >   corruption, so the corruption is something specific to SSD.
 > - I tried the same setup on a PCI card with CMD649 and saw no corruption.
 > 
 > This patch serializes the operation for CMD646 and 643 (I didn't test
 > CMD643 but it may have the same hw bug too because it's earlier design).
 > CMD649 is good. I don't know anything about CMD 648.

Have you checked if the libata pata_cmd64x driver also has this
problem?  I know that it works fine on the Ultra 5 with just a
single pata drive, but the pata+ssd combo may not have been tested.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-21 18:55 [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD Mikulas Patocka
  2009-10-21 19:34 ` Mikael Pettersson
@ 2009-10-21 19:39 ` Bartlomiej Zolnierkiewicz
  2009-10-22  0:41   ` David Miller
  1 sibling, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-10-21 19:39 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David S. Miller, linux-ide

On Wednesday 21 October 2009 20:55:28 Mikulas Patocka wrote:
> Hi
> 
> This patch fixes a data corruption when SSD is connected to Ultra 5.
> 
> Mikulas
> 
> --
> 
> Serialize CMD643 and CMD646 to fix a hardware bug with SSD
> 
> CMD646 corrupts data on concurrent transfers on both channels when IDE SSD is
> connected to one of the channels.
> 
> Setup that demonstrates this hardware bug: Ultra 5, onboard CMD646, rev 3.
> /dev/hda is 8GB Seagate ST38410A in MWDMA2
> /dev/hdd is 32GB SSD SiliconHardDisk in MWDMA2
> 
> - When reading /dev/hdd (for example with dd or fsck), reads from /dev/hda
>   are corrupted, there are twiddled single bits 1->0 and some full 32-bit
>   words corrupted, sometimes commands fail (which switches /dev/hda to
>   PIO mode but the corruptions happen even in PIO).
> - Reads from /dev/hdd don't seem to be corrupted (i.e. fsck passes fine).
> - When I connected normal rotating harddisk to /dev/hdd, there was no
>   corruption, so the corruption is something specific to SSD.
> - I tried the same setup on a PCI card with CMD649 and saw no corruption.
> 
> This patch serializes the operation for CMD646 and 643 (I didn't test
> CMD643 but it may have the same hw bug too because it's earlier design).
> CMD649 is good. I don't know anything about CMD 648.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Seems like SSD (simply by being faster) triggers some race condition that
hardware has tolerated in the past and since we used to always serialize
operation for CMD646 before:

commit e01698aed04811b9a9c4f8d54b73cb182757063d           
Author: David S. Miller <davem@davemloft.net>             
Date:   Sun Jun 21 22:48:03 2009 -0700

    ide cmd64x: Remove serialize setting.

it went undetected until now..

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-21 19:34 ` Mikael Pettersson
@ 2009-10-21 23:01   ` Mikulas Patocka
  2009-10-27 11:34     ` Alan Cox
  0 siblings, 1 reply; 63+ messages in thread
From: Mikulas Patocka @ 2009-10-21 23:01 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: David S. Miller, linux-ide

>  > This patch serializes the operation for CMD646 and 643 (I didn't test
>  > CMD643 but it may have the same hw bug too because it's earlier design).
>  > CMD649 is good. I don't know anything about CMD 648.
> 
> Have you checked if the libata pata_cmd64x driver also has this
> problem?  I know that it works fine on the Ultra 5 with just a
> single pata drive, but the pata+ssd combo may not have been tested.

I have tried libata driver now --- I see no data corruption but I've got 
this error under the same operation (both channels reading):

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: BMDMA stat 0x24
ata1.00: cmd c8/00:10:10:c8:d9/00:00:00:00:00/e0 tag 0 dma 8192 in
         res 50/00:00:1f:c8:d9/00:00:00:00:00/e0 Emask 0x2 (HSM violation)
ata1.00: status: { DRDY }
pata_cmd64x: active 10 recovery 10 setup 3.
pata_cmd64x: active 10 recovery 10 setup 3.
ata1: soft resetting link
pata_cmd64x: active 3 recovery 1 setup 1.
pata_cmd64x: active 3 recovery 1 setup 1.
ata1.00: configured for MWDMA2
ata1: EH complete

(ata1 is the primary channel)
--- I'm wondering, what does it mean? status 0x50 is OK. DMA status 0x24 
is also OK. What was the problem there?

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-21 19:39 ` Bartlomiej Zolnierkiewicz
@ 2009-10-22  0:41   ` David Miller
  2009-10-22  9:44     ` Bartlomiej Zolnierkiewicz
                       ` (3 more replies)
  0 siblings, 4 replies; 63+ messages in thread
From: David Miller @ 2009-10-22  0:41 UTC (permalink / raw)
  To: bzolnier; +Cc: mpatocka, linux-ide, elendil

From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Wed, 21 Oct 2009 21:39:24 +0200

> Seems like SSD (simply by being faster) triggers some race condition that
> hardware has tolerated in the past and since we used to always serialize
> operation for CMD646 before:

Yes, and technically we only did the synchronization for one
of the two chips mpatocka is adding the serialize setting to.

> commit e01698aed04811b9a9c4f8d54b73cb182757063d           
> Author: David S. Miller <davem@davemloft.net>             
> Date:   Sun Jun 21 22:48:03 2009 -0700
> 
>     ide cmd64x: Remove serialize setting.
> 
> it went undetected until now..

Right, and see also:

commit 6b5cde3629701258004b94cde75dd1089b556b02
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date:   Mon Dec 29 20:27:32 2008 +0100

    cmd64x: set IDE_HFLAG_SERIALIZE explictly for CMD646

Which is how we got there.

The most conservative thing to do is to set the flag as
is done by mpatocka's patch but I'd like Frans to regression
test that on his ultra5.

Frans can you do that?

Thanks.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22  0:41   ` David Miller
@ 2009-10-22  9:44     ` Bartlomiej Zolnierkiewicz
  2009-10-22 11:00       ` David Miller
  2009-10-23 14:29       ` Mikulas Patocka
  2009-10-22 13:56     ` Alan Cox
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-10-22  9:44 UTC (permalink / raw)
  To: David Miller; +Cc: mpatocka, linux-ide, elendil

On Thursday 22 October 2009 02:41:55 David Miller wrote:
> From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date: Wed, 21 Oct 2009 21:39:24 +0200
> 
> > Seems like SSD (simply by being faster) triggers some race condition that
> > hardware has tolerated in the past and since we used to always serialize
> > operation for CMD646 before:
> 
> Yes, and technically we only did the synchronization for one
> of the two chips mpatocka is adding the serialize setting to.
> 
> > commit e01698aed04811b9a9c4f8d54b73cb182757063d           
> > Author: David S. Miller <davem@davemloft.net>             
> > Date:   Sun Jun 21 22:48:03 2009 -0700
> > 
> >     ide cmd64x: Remove serialize setting.
> > 
> > it went undetected until now..
> 
> Right, and see also:
> 
> commit 6b5cde3629701258004b94cde75dd1089b556b02
> Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date:   Mon Dec 29 20:27:32 2008 +0100
> 
>     cmd64x: set IDE_HFLAG_SERIALIZE explictly for CMD646
> 
> Which is how we got there.

We are through this the second time and you're still not willing
neither to listen nor to read the code.  We always did serialization
for CMD646, we just used  hwif->chipset == ide_cmd646 (without using
IDE_HFLAG_SERIALIZE flag):

1061                 if (h && h->hwgroup) {  /* scan only initialized ports */
1062                         if (hwif->irq == h->irq) {
1063                                 hwif->sharing_irq = h->sharing_irq = 1;
1064                                 if (hwif->chipset != ide_pci ||
1065                                     h->chipset != ide_pci) {
1066                                         save_match(hwif, h, &match);
1067                                 }
1068                         }

so the code was using the same serialized hwgroup for CMD646 (which always
uses shared PCI IRQ AFAIK) because of hwif->chipset == ide_cmd646.  My patch
only made this explicit in preparation for other changes (one of such other
changes resulted later in uncovering unexpected IRQ problem on Ultra 5).

> The most conservative thing to do is to set the flag as
> is done by mpatocka's patch but I'd like Frans to regression
> test that on his ultra5.

Agreed, though I wonder whether we should also provide module parameter to
disable serializing on those chipsets for people not using SSDs...

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22  9:44     ` Bartlomiej Zolnierkiewicz
@ 2009-10-22 11:00       ` David Miller
  2009-10-22 11:15         ` Bartlomiej Zolnierkiewicz
  2009-10-23 14:29       ` Mikulas Patocka
  1 sibling, 1 reply; 63+ messages in thread
From: David Miller @ 2009-10-22 11:00 UTC (permalink / raw)
  To: bzolnier; +Cc: mpatocka, linux-ide, elendil

From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Thu, 22 Oct 2009 11:44:04 +0200

> We are through this the second time and you're still not willing
> neither to listen nor to read the code.

I can understand why you might attribute malice and ignorance to my
actions by default, but it doesn't apply here.

> We always did serialization for CMD646, we just used hwif->chipset
> == ide_cmd646 (without using IDE_HFLAG_SERIALIZE flag):

I fully acknowledge that we always serialized, sorry for not
being explicit.

I was just showing where the serialize IRQ flag got added
(and again, it was a correct change).

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22 11:00       ` David Miller
@ 2009-10-22 11:15         ` Bartlomiej Zolnierkiewicz
  2009-10-22 11:20           ` David Miller
  0 siblings, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-10-22 11:15 UTC (permalink / raw)
  To: David Miller; +Cc: mpatocka, linux-ide, elendil

On Thursday 22 October 2009 13:00:49 David Miller wrote:
> From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date: Thu, 22 Oct 2009 11:44:04 +0200
> 
> > We always did serialization for CMD646, we just used hwif->chipset
> > == ide_cmd646 (without using IDE_HFLAG_SERIALIZE flag):
> 
> I fully acknowledge that we always serialized, sorry for not
> being explicit.
> 
> I was just showing where the serialize IRQ flag got added
> (and again, it was a correct change).

What for?  The only patch changing behavior was yours and it seems
your luck is much worse than mine when it comes to unexpected side
effects.. ;)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22 11:15         ` Bartlomiej Zolnierkiewicz
@ 2009-10-22 11:20           ` David Miller
  0 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2009-10-22 11:20 UTC (permalink / raw)
  To: bzolnier; +Cc: mpatocka, linux-ide, elendil

From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Thu, 22 Oct 2009 13:15:59 +0200

> On Thursday 22 October 2009 13:00:49 David Miller wrote:
>> From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
>> Date: Thu, 22 Oct 2009 11:44:04 +0200
>> 
>> > We always did serialization for CMD646, we just used hwif->chipset
>> > == ide_cmd646 (without using IDE_HFLAG_SERIALIZE flag):
>> 
>> I fully acknowledge that we always serialized, sorry for not
>> being explicit.
>> 
>> I was just showing where the serialize IRQ flag got added
>> (and again, it was a correct change).
> 
> What for?  The only patch changing behavior was yours and it seems
> your luck is much worse than mine when it comes to unexpected side
> effects.. ;)

Like I said, you can attribute my actions to malice or ignorance
if you want, but it simply isn't there.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22  0:41   ` David Miller
  2009-10-22  9:44     ` Bartlomiej Zolnierkiewicz
@ 2009-10-22 13:56     ` Alan Cox
  2009-10-23  1:30       ` David Miller
  2009-10-23 14:50     ` Mikulas Patocka
  2009-10-24 11:28     ` Frans Pop
  3 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2009-10-22 13:56 UTC (permalink / raw)
  To: David Miller; +Cc: bzolnier, mpatocka, linux-ide, elendil

I doubt there is any hardware bug with the SSD. Some CMD64x hardware
certainly has errata and as the data sheets were published your starting
point would be the data sheets for the various chip versions. Does the
sparc use the same off the shelf part as the PC people or does it have
different glue or chip versions (eg as a macro cell in something else ?)

Alan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22 13:56     ` Alan Cox
@ 2009-10-23  1:30       ` David Miller
  0 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2009-10-23  1:30 UTC (permalink / raw)
  To: alan; +Cc: bzolnier, mpatocka, linux-ide, elendil

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Thu, 22 Oct 2009 14:56:13 +0100

> I doubt there is any hardware bug with the SSD. Some CMD64x hardware
> certainly has errata and as the data sheets were published your starting
> point would be the data sheets for the various chip versions. Does the
> sparc use the same off the shelf part as the PC people or does it have
> different glue or chip versions (eg as a macro cell in something else ?)

The sparc machines use the standard cmd64x chips without any special
glue logic.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22  9:44     ` Bartlomiej Zolnierkiewicz
  2009-10-22 11:00       ` David Miller
@ 2009-10-23 14:29       ` Mikulas Patocka
  2009-10-23 14:31         ` David Miller
                           ` (2 more replies)
  1 sibling, 3 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-10-23 14:29 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: David Miller, linux-ide, elendil

> We are through this the second time and you're still not willing
> neither to listen nor to read the code.  We always did serialization
> for CMD646, we just used  hwif->chipset == ide_cmd646 (without using
> IDE_HFLAG_SERIALIZE flag):

> Agreed, though I wonder whether we should also provide module parameter to
> disable serializing on those chipsets for people not using SSDs...

Don't do it --- disks have cache and reading from the cache is as fast as 
reading from SSD (or even faster), so if there is some speed-race in the 
chip, there is a possibility that the data corruption happens with disks 
too --- just with lower probability.

If we don't know why the chip corrupts data, we must treat it as always 
corrupting data.

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 14:29       ` Mikulas Patocka
@ 2009-10-23 14:31         ` David Miller
  2009-10-23 14:44         ` Bartlomiej Zolnierkiewicz
  2009-10-23 17:15         ` [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD Alan Cox
  2 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2009-10-23 14:31 UTC (permalink / raw)
  To: mpatocka; +Cc: bzolnier, linux-ide, elendil

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Fri, 23 Oct 2009 10:29:16 -0400 (EDT)

> If we don't know why the chip corrupts data, we must treat it as always 
> corrupting data.

Completely agreed.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 14:29       ` Mikulas Patocka
  2009-10-23 14:31         ` David Miller
@ 2009-10-23 14:44         ` Bartlomiej Zolnierkiewicz
  2009-10-23 14:55           ` Mikulas Patocka
  2009-10-23 17:15         ` [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD Alan Cox
  2 siblings, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-10-23 14:44 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, linux-ide, elendil

On Friday 23 October 2009 16:29:16 Mikulas Patocka wrote:
> > We are through this the second time and you're still not willing
> > neither to listen nor to read the code.  We always did serialization
> > for CMD646, we just used  hwif->chipset == ide_cmd646 (without using
> > IDE_HFLAG_SERIALIZE flag):
> 
> > Agreed, though I wonder whether we should also provide module parameter to
> > disable serializing on those chipsets for people not using SSDs...
> 
> Don't do it --- disks have cache and reading from the cache is as fast as 
> reading from SSD (or even faster), so if there is some speed-race in the 
> chip, there is a possibility that the data corruption happens with disks 
> too --- just with lower probability.
> 
> If we don't know why the chip corrupts data, we must treat it as always 
> corrupting data.

I actually suspect that it is device/chipset specific interaction and not
generic problem so adding a fallback option (w/ BIG FAT WARNING) seem to
make sense..  Especially since we have never serialized on CMD643 and your
patch will be adding performance regression without even verifying that
the issue also affects this chipset..

-- 
Bartlomiej Zolnierkiewicz

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22  0:41   ` David Miller
  2009-10-22  9:44     ` Bartlomiej Zolnierkiewicz
  2009-10-22 13:56     ` Alan Cox
@ 2009-10-23 14:50     ` Mikulas Patocka
  2009-10-23 20:50       ` Sergei Shtylyov
  2009-10-24 11:28     ` Frans Pop
  3 siblings, 1 reply; 63+ messages in thread
From: Mikulas Patocka @ 2009-10-23 14:50 UTC (permalink / raw)
  To: David Miller; +Cc: bzolnier, linux-ide, elendil

> Right, and see also:
> 
> commit 6b5cde3629701258004b94cde75dd1089b556b02
> Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date:   Mon Dec 29 20:27:32 2008 +0100
> 
>     cmd64x: set IDE_HFLAG_SERIALIZE explictly for CMD646
> 
> Which is how we got there.
> 
> The most conservative thing to do is to set the flag as
> is done by mpatocka's patch but I'd like Frans to regression
> test that on his ultra5.
> 
> Frans can you do that?
> 
> Thanks.

I read the thread about wild interrupts that Frans was observing and that 
led to disabling the serialization.

The thing is tricky --- if we read status register on interrupt, we break 
the serialization principle and introduce potential data corruption.

If we don't read the status register, the wild interrupt kills the whole 
interrupt line.

I think the interrupt handler should read the BM-status register on both 
channels (it reflects the interrupt state even in PIO mode) to test if 
there is an unexpected interrupt on the inactive channel --- this should 
be much more safe than reading the status register. If there is an 
interrupt, then
--- read the status of the inactive channel? (potential data corruption, 
but it is reported to happen only on boot).
--- Or can the interrupt be acknowledged only in BM-status without 
touching the device? I believe yes, it shoud shut the PCI interrupt but it 
would leave the IDE interrupt line on (should be cleared on next command).

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 14:44         ` Bartlomiej Zolnierkiewicz
@ 2009-10-23 14:55           ` Mikulas Patocka
  2009-10-23 15:03             ` Bartlomiej Zolnierkiewicz
  2009-10-23 16:51             ` Alan Cox
  0 siblings, 2 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-10-23 14:55 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: David Miller, linux-ide, elendil



On Fri, 23 Oct 2009, Bartlomiej Zolnierkiewicz wrote:

> On Friday 23 October 2009 16:29:16 Mikulas Patocka wrote:
> > > We are through this the second time and you're still not willing
> > > neither to listen nor to read the code.  We always did serialization
> > > for CMD646, we just used  hwif->chipset == ide_cmd646 (without using
> > > IDE_HFLAG_SERIALIZE flag):
> > 
> > > Agreed, though I wonder whether we should also provide module parameter to
> > > disable serializing on those chipsets for people not using SSDs...
> > 
> > Don't do it --- disks have cache and reading from the cache is as fast as 
> > reading from SSD (or even faster), so if there is some speed-race in the 
> > chip, there is a possibility that the data corruption happens with disks 
> > too --- just with lower probability.
> > 
> > If we don't know why the chip corrupts data, we must treat it as always 
> > corrupting data.
> 
> I actually suspect that it is device/chipset specific interaction and not
> generic problem so adding a fallback option (w/ BIG FAT WARNING) seem to
> make sense..

So, why was it serialized before? I assume that either someone noticed the 
corruption or someone read some datasheet noting the corruption or someone 
reverse engineered some other driver and saw the serialization.

> Especially since we have never serialized on CMD643 and your patch will 
> be adding performance regression without even verifying that the issue 
> also affects this chipset..

Do you have this chip? If you were IDE maintainer, did you collect cards 
with IDE chips?

Mikulas

> -- 
> Bartlomiej Zolnierkiewicz
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 14:55           ` Mikulas Patocka
@ 2009-10-23 15:03             ` Bartlomiej Zolnierkiewicz
  2009-10-23 15:18               ` Daniela Engert
  2009-10-23 16:51             ` Alan Cox
  1 sibling, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-10-23 15:03 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, linux-ide, elendil

On Friday 23 October 2009 16:55:59 Mikulas Patocka wrote:
> 
> On Fri, 23 Oct 2009, Bartlomiej Zolnierkiewicz wrote:
> 
> > On Friday 23 October 2009 16:29:16 Mikulas Patocka wrote:
> > > > We are through this the second time and you're still not willing
> > > > neither to listen nor to read the code.  We always did serialization
> > > > for CMD646, we just used  hwif->chipset == ide_cmd646 (without using
> > > > IDE_HFLAG_SERIALIZE flag):
> > > 
> > > > Agreed, though I wonder whether we should also provide module parameter to
> > > > disable serializing on those chipsets for people not using SSDs...
> > > 
> > > Don't do it --- disks have cache and reading from the cache is as fast as 
> > > reading from SSD (or even faster), so if there is some speed-race in the 
> > > chip, there is a possibility that the data corruption happens with disks 
> > > too --- just with lower probability.
> > > 
> > > If we don't know why the chip corrupts data, we must treat it as always 
> > > corrupting data.
> > 
> > I actually suspect that it is device/chipset specific interaction and not
> > generic problem so adding a fallback option (w/ BIG FAT WARNING) seem to
> > make sense..
> 
> So, why was it serialized before? I assume that either someone noticed the 
> corruption or someone read some datasheet noting the corruption or someone 
> reverse engineered some other driver and saw the serialization.

Serialization is usually needed in case of chipset not handling concurrent
data transfers on both ports.  Unfortunately I don't know details for CMD646.

> > Especially since we have never serialized on CMD643 and your patch will 
> > be adding performance regression without even verifying that the issue 
> > also affects this chipset..
> 
> Do you have this chip? If you were IDE maintainer, did you collect cards 
> with IDE chips?

I recall having CMD648 and CMD649 cards somewhere, however not earlier
chipsets.

-- 
Bartlomiej Zolnierkiewicz

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 15:03             ` Bartlomiej Zolnierkiewicz
@ 2009-10-23 15:18               ` Daniela Engert
  0 siblings, 0 replies; 63+ messages in thread
From: Daniela Engert @ 2009-10-23 15:18 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Mikulas Patocka, David Miller, linux-ide, elendil

Bartlomiej Zolnierkiewicz schrieb:
> On Friday 23 October 2009 16:55:59 Mikulas Patocka wrote:
>    
>> On Fri, 23 Oct 2009, Bartlomiej Zolnierkiewicz wrote:
>>
>>      
>>> On Friday 23 October 2009 16:29:16 Mikulas Patocka wrote:
>>>        
>>>>> We are through this the second time and you're still not willing
>>>>> neither to listen nor to read the code.  We always did serialization
>>>>> for CMD646, we just used  hwif->chipset == ide_cmd646 (without using
>>>>> IDE_HFLAG_SERIALIZE flag):
>>>>>            
>>>>          
>>>>> Agreed, though I wonder whether we should also provide module parameter to
>>>>> disable serializing on those chipsets for people not using SSDs...
>>>>>            
>>>> Don't do it --- disks have cache and reading from the cache is as fast as
>>>> reading from SSD (or even faster), so if there is some speed-race in the
>>>> chip, there is a possibility that the data corruption happens with disks
>>>> too --- just with lower probability.
>>>>
>>>> If we don't know why the chip corrupts data, we must treat it as always
>>>> corrupting data.
>>>>          
>>> I actually suspect that it is device/chipset specific interaction and not
>>> generic problem so adding a fallback option (w/ BIG FAT WARNING) seem to
>>> make sense..
>>>        
>> So, why was it serialized before? I assume that either someone noticed the
>> corruption or someone read some datasheet noting the corruption or someone
>> reverse engineered some other driver and saw the serialization.
>>      
> Serialization is usually needed in case of chipset not handling concurrent
> data transfers on both ports.  Unfortunately I don't know details for CMD646.
>    
Guys, this is old news. CMD643 and CMD646 have a shared data path from 
the PCI interface to the ATA channel ports. In a document issued by CMD, 
they explain the requirement for serialization. This issue is rectified 
with CMD648 and later chips.
>    
>>> Especially since we have never serialized on CMD643 and your patch will
>>> be adding performance regression without even verifying that the issue
>>> also affects this chipset..
>>>        
>> Do you have this chip? If you were IDE maintainer, did you collect cards
>> with IDE chips?
>>      
> I recall having CMD648 and CMD649 cards somewhere, however not earlier
> chipsets.
>
>    


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 14:55           ` Mikulas Patocka
  2009-10-23 15:03             ` Bartlomiej Zolnierkiewicz
@ 2009-10-23 16:51             ` Alan Cox
  2009-10-23 17:27               ` Sergei Shtylyov
  1 sibling, 1 reply; 63+ messages in thread
From: Alan Cox @ 2009-10-23 16:51 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Bartlomiej Zolnierkiewicz, David Miller, linux-ide, elendil

> So, why was it serialized before? I assume that either someone noticed the 
> corruption or someone read some datasheet noting the corruption or someone 
> reverse engineered some other driver and saw the serialization.

The data sheet has some things to say for the 643 abd 646

In PIO mode it says you must check

CFR bit 2   (indicates primary interrupt)
ARTIM23 bit 4  (secondary interrupt)

Both bits can be written 1 to clear the interrupt. The doc doesn't say
anything about not touching status but it also uses the word "must" about
the other bits so make what you will of it.

In DMA mode I would read the BMDMA status in preference to status but the
doc doesn't specifically say to do so and simply say it works to the
spec. it isn't clear if ARTIM23 and CFR work for reporting/clearing a DMA
interrupt. You'd have to try it.

ARTIM23 may well need some locking care


The 646U2 adds the MRDMODE register so you can check the bits more sanely



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 14:29       ` Mikulas Patocka
  2009-10-23 14:31         ` David Miller
  2009-10-23 14:44         ` Bartlomiej Zolnierkiewicz
@ 2009-10-23 17:15         ` Alan Cox
  2 siblings, 0 replies; 63+ messages in thread
From: Alan Cox @ 2009-10-23 17:15 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Bartlomiej Zolnierkiewicz, David Miller, linux-ide, elendil

> Don't do it --- disks have cache and reading from the cache is as fast as 
> reading from SSD (or even faster), so if there is some speed-race in the 
> chip, there is a possibility that the data corruption happens with disks 
> too --- just with lower probability.
> 
> If we don't know why the chip corrupts data, we must treat it as always 
> corrupting data.

In which case we should delete the driver because serializing it is
probably not sufficient. There is a proper way to deal with IDE problems
and randomly turning things on and off isn't the solution from experience.

So
- It would be useful to get the data sheet that Daniela refers to
- If there is some kind of data path issue then serializing probably isn't
  enough on its own as has been said

and we need to understand why the libata case doesn't show corruption but
clearly shows it is unhappy. Libata doesn't serialize the device and
doesn't do fancy IRQ checking either.

Just serializing is likely to make it worse - it may well become a rare
deeply obscure corruption rather than a nice visible one like we have now
- and that is far far nastier. Most likely libata also needs work.





^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 16:51             ` Alan Cox
@ 2009-10-23 17:27               ` Sergei Shtylyov
  2009-10-23 18:22                 ` Alan Cox
  0 siblings, 1 reply; 63+ messages in thread
From: Sergei Shtylyov @ 2009-10-23 17:27 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mikulas Patocka, Bartlomiej Zolnierkiewicz, David Miller,
	linux-ide, elendil

Hello.

Alan Cox wrote:

>>So, why was it serialized before? I assume that either someone noticed the 
>>corruption or someone read some datasheet noting the corruption or someone 
>>reverse engineered some other driver and saw the serialization.

> The data sheet has some things to say for the 643 abd 646

> In PIO mode it says you must check

> CFR bit 2   (indicates primary interrupt)
> ARTIM23 bit 4  (secondary interrupt)

> Both bits can be written 1 to clear the interrupt.

    Not in the original PCI0646 datasheet -- it says that reading the 
register clears the interrupt bit. PCI0646U datashett however started 
talking about writing one to clear the bit.

> The doc doesn't say
> anything about not touching status but it also uses the word "must" about
> the other bits so make what you will of it.

    I think they use must in the sense that if the driver *really* needs to 
know from which channel the interrupt has come. Since most of the chip don't 
provide that capability, the real world Linux drivers, both old and new, had 
to get along without such knowledge. The IDE driver does read and clear the 
interrupt bits now though, in both PIO abnd DMA modes.

> In DMA mode I would read the BMDMA status in preference to status but the
> doc doesn't specifically say to do so and simply say it works to the
> spec. it isn't clear if ARTIM23 and CFR work for reporting/clearing a DMA
> interrupt. You'd have to try it.

    Earlier the IDE driver did chec CFR/ARTTIM23 bits before the BMIDE 
status interrupt bit in dma_test_irq() method and that used to work.  Now it 
also does so, via the new test_irq() method.

> ARTIM23 may well need some locking care

    Locking WRT what? It's not shared between channels...

> The 646U2 adds the MRDMODE register so you can check the bits more sanely

    Yeah. The libata driver however doesn't make use of that register, 
unlike  cmd64x. It doesn't even touch the interrupt bits on PCI0646, only 
manipulation the "old" bits on PCI-64[89].
    It seems that I need to sync up these two, pata_cmd64x.c seems to be 
lagging behind the changes in cmd64x.c. Well, not only this one, at least 
pata_hpt* drivers are lagging behind too...

WBR, Sergei

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 17:27               ` Sergei Shtylyov
@ 2009-10-23 18:22                 ` Alan Cox
  2009-10-23 18:52                   ` Bartlomiej Zolnierkiewicz
  2009-10-26 11:36                   ` Mikulas Patocka
  0 siblings, 2 replies; 63+ messages in thread
From: Alan Cox @ 2009-10-23 18:22 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Mikulas Patocka, Bartlomiej Zolnierkiewicz, David Miller,
	linux-ide, elendil

>     Yeah. The libata driver however doesn't make use of that register, 
> unlike  cmd64x. It doesn't even touch the interrupt bits on PCI0646, only 
> manipulation the "old" bits on PCI-64[89].

Yes I purposefully left out all the complexity because when I tested the
cards I had it simply wasn't needed. I'm also not sure we should merge
anything from the old to the new one until we know why the old one
corrupts and the new one merely gets upset. Accidentally porting over a
corruptor would be very bad indeed.

>     It seems that I need to sync up these two, pata_cmd64x.c seems to be 
> lagging behind the changes in cmd64x.c. Well, not only this one, at least 
> pata_hpt* drivers are lagging behind too...

That would be useful - although several of the differences are because
the pata_hpt driver took from the vendor supplied reference code not the
old IDE code. It also seems more stable for having done that.

Alan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 18:22                 ` Alan Cox
@ 2009-10-23 18:52                   ` Bartlomiej Zolnierkiewicz
  2009-10-24  3:24                     ` David Miller
  2009-10-26 11:36                   ` Mikulas Patocka
  1 sibling, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-10-23 18:52 UTC (permalink / raw)
  To: Alan Cox
  Cc: Sergei Shtylyov, Mikulas Patocka, David Miller, linux-ide, elendil

On Friday 23 October 2009 20:22:50 Alan Cox wrote:
> >     Yeah. The libata driver however doesn't make use of that register, 
> > unlike  cmd64x. It doesn't even touch the interrupt bits on PCI0646, only 
> > manipulation the "old" bits on PCI-64[89].
> 
> Yes I purposefully left out all the complexity because when I tested the
> cards I had it simply wasn't needed. I'm also not sure we should merge
> anything from the old to the new one until we know why the old one
> corrupts and the new one merely gets upset. Accidentally porting over a
> corruptor would be very bad indeed.

Oh, we know that.

"Old" one corrupts because somebody thought it would be a smart move to remove
serialized flag instead of debugging certain problem properly..

"New" one gets serialized at the command issue level (like all other new shiny
libata PATA stuff) and this may as well explain the difference..

> >     It seems that I need to sync up these two, pata_cmd64x.c seems to be 
> > lagging behind the changes in cmd64x.c. Well, not only this one, at least 
> > pata_hpt* drivers are lagging behind too...

Just from the memory:

pata_sis, pata_atiixp, pata_it8213, pata_cs5536, pata_amd..

So, gentlemen, when are you planning to remove IDE (not from your playground
Linux distribution but from kernel.org)?

I've heard that it will happen "soon" 4 years ago... :)

> That would be useful - although several of the differences are because
> the pata_hpt driver took from the vendor supplied reference code not the
> old IDE code. It also seems more stable for having done that.

Yeah, right..

This reminds me of that hpt366 blacklist that somebody forgot to port over
initially and which resulted in real world data corruptions..

-- 
Bartlomiej Zolnierkiewicz

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 14:50     ` Mikulas Patocka
@ 2009-10-23 20:50       ` Sergei Shtylyov
  2009-10-26 11:30         ` Mikulas Patocka
  0 siblings, 1 reply; 63+ messages in thread
From: Sergei Shtylyov @ 2009-10-23 20:50 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, bzolnier, linux-ide, elendil

Hello.

Mikulas Patocka wrote:

>> Right, and see also:
>>
>> commit 6b5cde3629701258004b94cde75dd1089b556b02
>> Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
>> Date:   Mon Dec 29 20:27:32 2008 +0100
>>
>>     cmd64x: set IDE_HFLAG_SERIALIZE explictly for CMD646
>>
>> Which is how we got there.
>>
>> The most conservative thing to do is to set the flag as
>> is done by mpatocka's patch but I'd like Frans to regression
>> test that on his ultra5.
>>
>> Frans can you do that?
>>
>> Thanks.
>>     
>
> I read the thread about wild interrupts that Frans was observing and that 
> led to disabling the serialization.
>
> The thing is tricky --- if we read status register on interrupt, we break 
> the serialization principle and introduce potential data corruption.
>
> If we don't read the status register, the wild interrupt kills the whole 
> interrupt line.
>
> I think the interrupt handler should read the BM-status register on both 
> channels (it reflects the interrupt state even in PIO mode) to test if 
>   

   Are you sure? Anyway, there's no need as we're reading the interrupt 
bits CFR/ARTTIM23 registers first (at least in the IDE code). Look at 
the cmd*_test_irq() methods and ide_intr().

> there is an unexpected interrupt on the inactive channel --- this should 
> be much more safe than reading the status register. If there is an 
> interrupt, then
> --- read the status of the inactive channel? (potential data corruption, 
> but it is reported to happen only on boot).
> --- Or can the interrupt be acknowledged only in BM-status without 
> touching the device? I believe yes,

   And I believe no. BMIDE status bit doesn't acknoledge (clear, to be 
precise) the IDE interrupts, only the status register read does.


> it shoud shut the PCI interrupt but it 
> would leave the IDE interrupt line on (should be cleared on next command).
>   

    I think the negated wired-OR of both INTRQ signals serves as an 
-INTA source, not the BMIDE status bits. At least in the general case, 
where the BMIDE status doesn't reflect PIO mode interrupts.

> Mikulas

WBR, Sergei



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 18:52                   ` Bartlomiej Zolnierkiewicz
@ 2009-10-24  3:24                     ` David Miller
  2009-10-24 12:38                       ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2009-10-24  3:24 UTC (permalink / raw)
  To: bzolnier; +Cc: alan, sshtylyov, mpatocka, linux-ide, elendil

From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Fri, 23 Oct 2009 20:52:47 +0200

> "Old" one corrupts because somebody thought it would be a smart move
> to remove serialized flag instead of debugging certain problem
> properly..

Bart, I fear you may live your entire life always with some
axe to grind with someone.

It makes it impossible to work with you effectively.

I've tried to start taking the personal attacks out of my
interactions with you, but you seem to be completely unable
to drop it.

If you're constantly bitter about this or that, nobody will
want to work with you.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-22  0:41   ` David Miller
                       ` (2 preceding siblings ...)
  2009-10-23 14:50     ` Mikulas Patocka
@ 2009-10-24 11:28     ` Frans Pop
  2009-10-24 11:31       ` David Miller
  3 siblings, 1 reply; 63+ messages in thread
From: Frans Pop @ 2009-10-24 11:28 UTC (permalink / raw)
  To: David Miller; +Cc: bzolnier, mpatocka, linux-ide

On Thursday 22 October 2009, David Miller wrote:
> The most conservative thing to do is to set the flag as
> is done by mpatocka's patch but I'd like Frans to regression
> test that on his ultra5.
>
> Frans can you do that?

Sure, if you send me the patch (or a link to it). May take a few days.

Also, is it still needed given the whole discussion that happened after 
this mail?

Cheers,
FJP

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-24 11:28     ` Frans Pop
@ 2009-10-24 11:31       ` David Miller
  2009-10-25  2:48         ` Frans Pop
  0 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2009-10-24 11:31 UTC (permalink / raw)
  To: elendil; +Cc: bzolnier, mpatocka, linux-ide

From: Frans Pop <elendil@planet.nl>
Date: Sat, 24 Oct 2009 13:28:39 +0200

> On Thursday 22 October 2009, David Miller wrote:
>> The most conservative thing to do is to set the flag as
>> is done by mpatocka's patch but I'd like Frans to regression
>> test that on his ultra5.
>>
>> Frans can you do that?
> 
> Sure, if you send me the patch (or a link to it). May take a few days.

It's here:

http://patchwork.ozlabs.org/patch/36615/

Also, all pending IDE patches can always be found at:

http://patchwork.ozlabs.org/project/linux-ide/list/

> Also, is it still needed given the whole discussion that happened after 
> this mail?

Yes, I still fully intend to apply mpatocka's patch, as-is.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-24  3:24                     ` David Miller
@ 2009-10-24 12:38                       ` Bartlomiej Zolnierkiewicz
  2009-10-24 12:58                         ` David Miller
  0 siblings, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-10-24 12:38 UTC (permalink / raw)
  To: David Miller; +Cc: alan, sshtylyov, mpatocka, linux-ide, elendil

On Saturday 24 October 2009 05:24:47 David Miller wrote:
> From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date: Fri, 23 Oct 2009 20:52:47 +0200
> 
> > "Old" one corrupts because somebody thought it would be a smart move
> > to remove serialized flag instead of debugging certain problem
> > properly..
> 
> Bart, I fear you may live your entire life always with some
> axe to grind with someone.
> 
> It makes it impossible to work with you effectively.

You mean that I don't enjoy the idea of "just a managers" trying to
"manage" my free time?  Well, I don't.

I also don't buy the definition of "working with" presented by you.

> I've tried to start taking the personal attacks out of my
> interactions with you, but you seem to be completely unable
> to drop it.
> 
> If you're constantly bitter about this or that, nobody will
> want to work with you.

I'm not bitter, I just stick to the facts.

My mails may sound rude sometimes but they _always_ have the backing
in facts (and if I'm wrong I have no problem with saying "sorry, I was
wrong").

If you want to see how (unprovoked) personal attacks look like please
go re-read your own mails from few months ago...

It is all here:

	http://patchwork.ozlabs.org/patch/28945/

Please start with:

| Unlike the commit log message states, I suspect this change
| "introduces" incorrect handling of unexpected IRQs rather than
| "fixing".  I suspect the problem arises when the controller

and

| That's why all the IDE drivers are constantly breaking on sparc.

parts.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-24 12:38                       ` Bartlomiej Zolnierkiewicz
@ 2009-10-24 12:58                         ` David Miller
  2009-10-24 13:13                           ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2009-10-24 12:58 UTC (permalink / raw)
  To: bzolnier; +Cc: alan, sshtylyov, mpatocka, linux-ide, elendil

From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Sat, 24 Oct 2009 14:38:00 +0200

> My mails may sound rude sometimes but they _always_ have the backing
> in facts (and if I'm wrong I have no problem with saying "sorry, I was
> wrong").

Being rude is not justified by being right.

> If you want to see how (unprovoked) personal attacks look like please
> go re-read your own mails from few months ago...

Yes, I am a complete asshole sometimes, and I need to improve
upon that.

Are you possesing such a level of willingness to change too?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-24 12:58                         ` David Miller
@ 2009-10-24 13:13                           ` Bartlomiej Zolnierkiewicz
  2009-10-24 13:20                             ` David Miller
  0 siblings, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-10-24 13:13 UTC (permalink / raw)
  To: David Miller; +Cc: alan, sshtylyov, mpatocka, linux-ide, elendil

On Saturday 24 October 2009 14:58:21 David Miller wrote:
> From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> 
> > If you want to see how (unprovoked) personal attacks look like please
> > go re-read your own mails from few months ago...
> 
> Yes, I am a complete asshole sometimes, and I need to improve
> upon that.

I also try to keep working on this but reality keeps standing in my way. ;)

> Are you possesing such a level of willingness to change too?

Lets just put all past misunderstandings behind and start over..

-- 
Bartlomiej Zolnierkiewicz

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-24 13:13                           ` Bartlomiej Zolnierkiewicz
@ 2009-10-24 13:20                             ` David Miller
  0 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2009-10-24 13:20 UTC (permalink / raw)
  To: bzolnier; +Cc: alan, sshtylyov, mpatocka, linux-ide, elendil

From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Sat, 24 Oct 2009 15:13:12 +0200

> On Saturday 24 October 2009 14:58:21 David Miller wrote:
>> Are you possesing such a level of willingness to change too?
> 
> Lets just put all past misunderstandings behind and start over..

Sounds like a good idea to me :-)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-24 11:31       ` David Miller
@ 2009-10-25  2:48         ` Frans Pop
  2009-10-29 10:02           ` David Miller
  0 siblings, 1 reply; 63+ messages in thread
From: Frans Pop @ 2009-10-25  2:48 UTC (permalink / raw)
  To: David Miller; +Cc: bzolnier, mpatocka, linux-ide

On Saturday 24 October 2009, David Miller wrote:
> > On Thursday 22 October 2009, David Miller wrote:
> >> The most conservative thing to do is to set the flag as
> >> is done by mpatocka's patch but I'd like Frans to regression
> >> test that on his ultra5.
> >>
> >> Frans can you do that?
> >
> > Sure, if you send me the patch (or a link to it). May take a few days.
>
> It's here:
> http://patchwork.ozlabs.org/patch/36615/

Looks to work OK on my system. dmesg shows ide0 and ide1 are now 
serialized.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 20:50       ` Sergei Shtylyov
@ 2009-10-26 11:30         ` Mikulas Patocka
  2009-10-26 18:20           ` Sergei Shtylyov
  0 siblings, 1 reply; 63+ messages in thread
From: Mikulas Patocka @ 2009-10-26 11:30 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: David Miller, bzolnier, linux-ide, elendil

>   Are you sure? Anyway, there's no need as we're reading the interrupt bits
> CFR/ARTTIM23 registers first (at least in the IDE code). Look at the
> cmd*_test_irq() methods and ide_intr().

Maybe the BMIDE status bit is just the same as CFR/ARTTIM23 interrupt bit.

> > there is an unexpected interrupt on the inactive channel --- this should be
> > much more safe than reading the status register. If there is an interrupt,
> > then
> > --- read the status of the inactive channel? (potential data corruption, but
> > it is reported to happen only on boot).
> > --- Or can the interrupt be acknowledged only in BM-status without touching
> > the device? I believe yes,
> 
>   And I believe no. BMIDE status bit doesn't acknoledge (clear, to be precise)
> the IDE interrupts, only the status register read does.

There are two things: IDE interrupt line set by the device (BMIDE status 
doesn't do anything with it) and chipset's INT[A-D] interrupt line --- and 
BMIDE status should clear it, at least for some chipsets.

Some chipset documentation (not for CMD64x) thatI have says that BMIDE 
irq status is set on any interrupt regardless if it's DMA or NONDMA.

On ICH SATA (in legacy non-AHCI mode), it is even required to acknowledge 
PIO interrupts with BMIDE status, otherwise the interrupt stays pending.

> > it shoud shut the PCI interrupt but it would leave the IDE interrupt line on
> > (should be cleared on next command).
> >   
> 
>    I think the negated wired-OR of both INTRQ signals serves as an -INTA
> source, not the BMIDE status bits. At least in the general case, where the
> BMIDE status doesn't reflect PIO mode interrupts.

It is not as simple, INTA and BMIDE status must be postponed until the 
chip flushes its buffers and writes the DMA last byte to the memory.

I agree with you that for some chipsets BMIDE doesn't have to be signalled 
in PIO mode --- but remember that here we are talking about dealing with 
broken devices that set the interrupt line spuriously and about 
serializing chipsets --- not about all chipsets and all devices.

So the best that can be done for such broken devices is to try to shut the 
interrupt in BMIDE register (or PCI registers in CMD64x). There is nothing 
better to do. If you have serializing chipset that doesn't let you shut 
interrupt and the inactive device fires spuriously --- there is absolutely 
nothing that can be done about it.

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-23 18:22                 ` Alan Cox
  2009-10-23 18:52                   ` Bartlomiej Zolnierkiewicz
@ 2009-10-26 11:36                   ` Mikulas Patocka
  2009-10-26 12:18                     ` Alan Cox
  1 sibling, 1 reply; 63+ messages in thread
From: Mikulas Patocka @ 2009-10-26 11:36 UTC (permalink / raw)
  To: Alan Cox
  Cc: Sergei Shtylyov, Bartlomiej Zolnierkiewicz, David Miller,
	linux-ide, elendil

On Fri, 23 Oct 2009, Alan Cox wrote:

> >     Yeah. The libata driver however doesn't make use of that register, 
> > unlike  cmd64x. It doesn't even touch the interrupt bits on PCI0646, only 
> > manipulation the "old" bits on PCI-64[89].
> 
> Yes I purposefully left out all the complexity because when I tested the
> cards I had it simply wasn't needed. I'm also not sure we should merge
> anything from the old to the new one until we know why the old one
> corrupts and the new one merely gets upset. Accidentally porting over a
> corruptor would be very bad indeed.

Well, if Daniela showed that cmd64[36] docs say it needs serialization, 
then just add it there. Don't rely on the fact that the corruption didn't 
happen in my case.

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-26 11:36                   ` Mikulas Patocka
@ 2009-10-26 12:18                     ` Alan Cox
  2009-11-05  1:25                       ` [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD Mikulas Patocka
  0 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2009-10-26 12:18 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Sergei Shtylyov, Bartlomiej Zolnierkiewicz, David Miller,
	linux-ide, elendil

On Mon, 26 Oct 2009 07:36:48 -0400 (EDT)
Mikulas Patocka <mpatocka@redhat.com> wrote:

> On Fri, 23 Oct 2009, Alan Cox wrote:
> 
> > >     Yeah. The libata driver however doesn't make use of that register, 
> > > unlike  cmd64x. It doesn't even touch the interrupt bits on PCI0646, only 
> > > manipulation the "old" bits on PCI-64[89].
> > 
> > Yes I purposefully left out all the complexity because when I tested the
> > cards I had it simply wasn't needed. I'm also not sure we should merge
> > anything from the old to the new one until we know why the old one
> > corrupts and the new one merely gets upset. Accidentally porting over a
> > corruptor would be very bad indeed.
> 
> Well, if Daniela showed that cmd64[36] docs say it needs serialization, 
> then just add it there. Don't rely on the fact that the corruption didn't 
> happen in my case.

Daniela sent me the relevant document - it doesn't exactly match what is
being described here as it relates to motherboards improperly timing out
PCI access requests.

Basically the fix is "don't touch the task file on one channel while DMA
is live on the other". That means that for libata serialize is not
sufficient on a shared IRQ set up as we'll touch ALTSTATUS. Plus it is
overkill as you can have parallel issue PIO commands.

For libata the best way to fix it seems to be to avoid parallel issuing
commands with an ATA_PROT_DMA/ATAPI_PROT_DMA command outstanding, and to
check the IRQ bits. Patch to follow when I get a moment.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-26 11:30         ` Mikulas Patocka
@ 2009-10-26 18:20           ` Sergei Shtylyov
  0 siblings, 0 replies; 63+ messages in thread
From: Sergei Shtylyov @ 2009-10-26 18:20 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, bzolnier, linux-ide, elendil

Hello.

Mikulas Patocka wrote:

>>  Are you sure? Anyway, there's no need as we're reading the interrupt bits
>>CFR/ARTTIM23 registers first (at least in the IDE code). Look at the
>>cmd*_test_irq() methods and ide_intr().

> Maybe the BMIDE status bit is just the same as CFR/ARTTIM23 interrupt bit.

    Maybe -- if it indeed gets set in PIO mode as well. In this case though, 
there's quite little sense in having that bit mirrored (even twice with the 
newer controllers).

>>>there is an unexpected interrupt on the inactive channel --- this should be
>>>much more safe than reading the status register. If there is an interrupt,
>>>then
>>>--- read the status of the inactive channel? (potential data corruption, but
>>>it is reported to happen only on boot).
>>>--- Or can the interrupt be acknowledged only in BM-status without touching
>>>the device? I believe yes,

>>  And I believe no. BMIDE status bit doesn't acknoledge (clear, to be precise)
>>the IDE interrupts, only the status register read does.

> There are two things: IDE interrupt line set by the device (BMIDE status 
> doesn't do anything with it) and chipset's INT[A-D] interrupt line --- and 
> BMIDE status should clear it, at least for some chipsets.

> Some chipset documentation (not for CMD64x) thatI have says that BMIDE 
> irq status is set on any interrupt regardless if it's DMA or NONDMA.

    That is rather untypical behavior although some chipsets like Intel ICH 
are known to do it.

> On ICH SATA (in legacy non-AHCI mode), it is even required to acknowledge 
> PIO interrupts with BMIDE status, otherwise the interrupt stays pending.

>>>it shoud shut the PCI interrupt but it would leave the IDE interrupt line on
>>>(should be cleared on next command).

>>   I think the negated wired-OR of both INTRQ signals serves as an -INTA
>>source, not the BMIDE status bits. At least in the general case, where the
>>BMIDE status doesn't reflect PIO mode interrupts.

> It is not as simple, INTA and BMIDE status must be postponed until the 
> chip flushes its buffers and writes the DMA last byte to the memory.

    I know. The delay logic only acts in the DMA case. And it doesn't have 
to delay the interrupt itself, only the BMIDE status read with bit 2 set -- 
which is achievable by retrying the I/O transaction on PCI until the DMA 
actually completes.

> I agree with you that for some chipsets BMIDE doesn't have to be signalled 
> in PIO mode --- but remember that here we are talking about dealing with 
> broken devices that set the interrupt line spuriously and about 
> serializing chipsets --- not about all chipsets and all devices.

> So the best that can be done for such broken devices is to try to shut the 
> interrupt in BMIDE register (or PCI registers in CMD64x). There is nothing 
> better to do.

    And we're doing it, now for the PIO case also.

> If you have serializing chipset that doesn't let you shut 
> interrupt and the inactive device fires spuriously --- there is absolutely 
> nothing that can be done about it.

    Yes, seems so from ide_intr()...

> Mikulas

WBR, Sergei

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-21 23:01   ` Mikulas Patocka
@ 2009-10-27 11:34     ` Alan Cox
  2009-10-28  1:10       ` Mikulas Patocka
  0 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2009-10-27 11:34 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Mikael Pettersson, David S. Miller, linux-ide

> (ata1 is the primary channel)
> --- I'm wondering, what does it mean? status 0x50 is OK. DMA status 0x24 
> is also OK. What was the problem there?

Beats me still. Thanks to Daniela for the info about the chip contention
I've got some bits that can be tried, but I don't actually have a 646 to
check this.

It should do the neccessary serializing and IRQ bit checks to avoid
hitting the case described in the app note.

cmd64x: implement serialization as per notes

From: Alan Cox <alan@linux.intel.com>

Daniela Engert pointed out that there are some implementation notes for the
643 and 646 that deal with certain serialization rules. In theory we don't
need them because they apply when the motherboard decides not to retry PCI
requests for long enough and the chip is busy doing a DMA transfer on the
other channel.

The rule basically is "don't touch the taskfile of the other channel while
a DMA is in progress". To implement that we need to

- not issue a command on a channel when there is a DMA command queued
- not issue a DMA command on a channel when there are PIO commands queued
- use the alternative access to the interrupt source so that we do not
  touch altstatus or status on shared IRQ.

Signed-off-by: Alan Cox <alan@linux.intel.com>
---

 drivers/ata/pata_cmd64x.c |  132 ++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 124 insertions(+), 8 deletions(-)


diff --git a/drivers/ata/pata_cmd64x.c b/drivers/ata/pata_cmd64x.c
index f98dffe..64917ac 100644
--- a/drivers/ata/pata_cmd64x.c
+++ b/drivers/ata/pata_cmd64x.c
@@ -31,7 +31,7 @@
 #include <linux/libata.h>
 
 #define DRV_NAME "pata_cmd64x"
-#define DRV_VERSION "0.2.5"
+#define DRV_VERSION "0.3.1"
 
 /*
  * CMD64x specific registers definition.
@@ -75,6 +75,13 @@ enum {
 	DTPR1		= 0x7C
 };
 
+struct cmd_priv
+{
+	int dma_live;		/* Channel using DMA */
+	int irq_t[2];		/* Register to check for IRQ */
+	int irq_m[2];		/* Bit to check */
+};
+
 static int cmd648_cable_detect(struct ata_port *ap)
 {
 	struct pci_dev *pdev = to_pci_dev(ap->host->dev);
@@ -254,17 +261,107 @@ static void cmd648_bmdma_stop(struct ata_queued_cmd *qc)
 }
 
 /**
- *	cmd646r1_dma_stop	-	DMA stop callback
+ *	cmd64x_bmdma_stop	-	DMA stop callback
  *	@qc: Command in progress
  *
- *	Stub for now while investigating the r1 quirk in the old driver.
+ *	Track the completion of live DMA commands and clear the dma_live
+ *	tracking flag as we do.
  */
 
-static void cmd646r1_bmdma_stop(struct ata_queued_cmd *qc)
+static void cmd64x_bmdma_stop(struct ata_queued_cmd *qc)
 {
+	struct ata_port *ap = qc->ap;
+	struct cmd_priv *priv = ap->host->private_data;
 	ata_bmdma_stop(qc);
+	WARN_ON(priv->dma_live != ap->port_no );
+	priv->dma_live = -1;
+}
+
+/**
+ *	cmd64x_qc_defer		-	Defer logic for chip limits
+ *	@qc: queued command
+ *
+ *	Decide whether we can issue the command. Called under the host lock.
+ */
+
+static int cmd64x_qc_defer(struct ata_queued_cmd *qc)
+{
+	struct ata_host *host = qc->ap->host;
+	struct cmd_priv *priv = host->private_data;
+	struct ata_port *alt = host->ports[1 ^ qc->ap->port_no];
+	int rc;
+
+	/* Apply the ATA rules first */
+	rc = ata_std_qc_defer(qc);
+	if (rc)
+		return rc;
+
+	/* If the other port is not live then issue the command */
+	if (alt == NULL || !alt->qc_active)
+		return 0;
+	/* If there is a live DMA command then wait */
+	if (priv->dma_live != -1)
+		return 	ATA_DEFER_PORT;
+	if (qc->tf.protocol == ATAPI_PROT_DMA ||
+				qc->tf.protocol == ATA_PROT_DMA) {
+		/* Cannot overlap our DMA command */
+		if (alt->qc_active)
+			return ATA_DEFER_PORT;
+		/* Claim the DMA */
+		priv->dma_live = qc->ap->port_no;
+	}
+	return 0;
 }
 
+/**
+ *	cmd64x_interrupt - ATA host interrupt handler
+ *	@irq: irq line (unused)
+ *	@dev_instance: pointer to our ata_host information structure
+ *
+ *	Our interrupt handler for PCI IDE devices.  Calls
+ *	ata_sff_host_intr() for each port that is flagging an IRQ. We cannot
+ *	use the defaults as we need to avoid touching status/altstatus during
+ *	a DMA.
+ *
+ *	LOCKING:
+ *	Obtains host lock during operation.
+ *
+ *	RETURNS:
+ *	IRQ_NONE or IRQ_HANDLED.
+ */
+irqreturn_t cmd64x_interrupt(int irq, void *dev_instance)
+{
+	struct ata_host *host = dev_instance;
+	struct pci_dev *pdev = to_pci_dev(host->dev);
+	struct cmd_priv *priv = host->private_data;
+	unsigned int i;
+	unsigned int handled = 0;
+	unsigned long flags;
+
+	/* TODO: make _irqsave conditional on x86 PCI IDE legacy mode */
+	spin_lock_irqsave(&host->lock, flags);
+
+	for (i = 0; i < host->n_ports; i++) {
+		struct ata_port *ap;
+		u8 reg;
+
+		pci_read_config_byte(pdev, priv->irq_t[i], &reg);
+		ap = host->ports[i];
+		if (ap && (reg & priv->irq_m[i]) &&
+		    !(ap->flags & ATA_FLAG_DISABLED)) {
+			struct ata_queued_cmd *qc;
+
+			qc = ata_qc_from_tag(ap, ap->link.active_tag);
+			if (qc && (!(qc->tf.flags & ATA_TFLAG_POLLING)) &&
+			    (qc->flags & ATA_QCFLAG_ACTIVE))
+				handled |= ata_sff_host_intr(ap, qc);
+		}
+	}
+
+	spin_unlock_irqrestore(&host->lock, flags);
+
+	return IRQ_RETVAL(handled);
+}
 static struct scsi_host_template cmd64x_sht = {
 	ATA_BMDMA_SHT(DRV_NAME),
 };
@@ -273,6 +370,8 @@ static const struct ata_port_operations cmd64x_base_ops = {
 	.inherits	= &ata_bmdma_port_ops,
 	.set_piomode	= cmd64x_set_piomode,
 	.set_dmamode	= cmd64x_set_dmamode,
+	.bmdma_stop	= cmd64x_bmdma_stop,
+	.qc_defer	= cmd64x_qc_defer,
 };
 
 static struct ata_port_operations cmd64x_port_ops = {
@@ -282,7 +381,6 @@ static struct ata_port_operations cmd64x_port_ops = {
 
 static struct ata_port_operations cmd646r1_port_ops = {
 	.inherits	= &cmd64x_base_ops,
-	.bmdma_stop	= cmd646r1_bmdma_stop,
 	.cable_detect	= ata_cable_40wire,
 };
 
@@ -290,6 +388,7 @@ static struct ata_port_operations cmd648_port_ops = {
 	.inherits	= &cmd64x_base_ops,
 	.bmdma_stop	= cmd648_bmdma_stop,
 	.cable_detect	= cmd648_cable_detect,
+	.qc_defer	= ata_std_qc_defer
 };
 
 static int cmd64x_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
@@ -340,6 +439,8 @@ static int cmd64x_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	const struct ata_port_info *ppi[] = { &cmd_info[id->driver_data], NULL };
 	u8 mrdmode;
 	int rc;
+	struct ata_host *host;
+	struct cmd_priv *cpriv;
 
 	rc = pcim_enable_device(pdev);
 	if (rc)
@@ -348,6 +449,17 @@ static int cmd64x_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	pci_read_config_dword(pdev, PCI_CLASS_REVISION, &class_rev);
 	class_rev &= 0xFF;
 
+	cpriv = devm_kzalloc(&pdev->dev, sizeof(*cpriv), GFP_KERNEL);
+	if (cpriv == NULL)
+		return -ENOMEM;
+	cpriv->dma_live = -1;
+
+	/* Table for IRQ checking */
+	cpriv->irq_t[0] = CFR;
+	cpriv->irq_m[0] = 1 << 2;
+	cpriv->irq_t[1] = ARTTIM23;
+	cpriv->irq_m[1] = 1 << 4;
+
 	if (id->driver_data == 0)	/* 643 */
 		ata_pci_bmdma_clear_simplex(pdev);
 
@@ -360,20 +472,24 @@ static int cmd64x_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 			ppi[0] = &cmd_info[3];
 	}
 
+
 	pci_write_config_byte(pdev, PCI_LATENCY_TIMER, 64);
 	pci_read_config_byte(pdev, MRDMODE, &mrdmode);
 	mrdmode &= ~ 0x30;	/* IRQ set up */
 	mrdmode |= 0x02;	/* Memory read line enable */
 	pci_write_config_byte(pdev, MRDMODE, mrdmode);
 
-	/* Force PIO 0 here.. */
-
 	/* PPC specific fixup copied from old driver */
 #ifdef CONFIG_PPC
 	pci_write_config_byte(pdev, UDIDETCR0, 0xF0);
 #endif
+	rc = ata_pci_sff_prepare_host(pdev, ppi, &host);
+	if (rc)
+		return rc;
+	host->private_data = cpriv;
 
-	return ata_pci_sff_init_one(pdev, ppi, &cmd64x_sht, NULL);
+	pci_set_master(pdev);
+	return ata_pci_sff_activate_host(host, cmd64x_interrupt, &cmd64x_sht);
 }
 
 #ifdef CONFIG_PM

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-27 11:34     ` Alan Cox
@ 2009-10-28  1:10       ` Mikulas Patocka
  0 siblings, 0 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-10-28  1:10 UTC (permalink / raw)
  To: Alan Cox; +Cc: Mikael Pettersson, David S. Miller, linux-ide

With every transfer, the pastchcauses this:

------------[ cut here ]------------
WARNING: at drivers/ata/pata_cmd64x.c:276 cmd64x_bmdma_stop+0x48/0x60()
Modules linked in: nbd sunhme openpromfs sermouse unix
Call Trace:
 [00000000005cab68] cmd64x_bmdma_stop+0x48/0x60
 [00000000005c99b8] ata_sff_host_intr+0x158/0x1e0
 [00000000005cb2f8] cmd64x_interrupt+0x178/0x1a0
 [000000000048354c] handle_IRQ_event+0x6c/0x140
 [0000000000485380] handle_fasteoi_irq+0x80/0x140
 [000000000042a660] handler_irq+0xc0/0x100
 [00000000004208b4] tl0_irq5+0x14/0x20
 [00000000005ec08c] skb_copy_datagram_iovec+0x1ec/0x220
 [0000000010000800] unix_dgram_recvmsg+0xc0/0x300 [unix]
 [00000000005e0ca8] sock_recvmsg+0x88/0xc0
 [00000000005e1f50] SyS_recvfrom+0x70/0xe0
 [0000000000406114] linux_sparc_syscall32+0x34/0x40
---[ end trace 62aae040ef69a03d ]---

Mikulas

On Tue, 27 Oct 2009, Alan Cox wrote:

> > (ata1 is the primary channel)
> > --- I'm wondering, what does it mean? status 0x50 is OK. DMA status 0x24 
> > is also OK. What was the problem there?
> 
> Beats me still. Thanks to Daniela for the info about the chip contention
> I've got some bits that can be tried, but I don't actually have a 646 to
> check this.
> 
> It should do the neccessary serializing and IRQ bit checks to avoid
> hitting the case described in the app note.
> 
> cmd64x: implement serialization as per notes
> 
> From: Alan Cox <alan@linux.intel.com>
> 
> Daniela Engert pointed out that there are some implementation notes for the
> 643 and 646 that deal with certain serialization rules. In theory we don't
> need them because they apply when the motherboard decides not to retry PCI
> requests for long enough and the chip is busy doing a DMA transfer on the
> other channel.
> 
> The rule basically is "don't touch the taskfile of the other channel while
> a DMA is in progress". To implement that we need to
> 
> - not issue a command on a channel when there is a DMA command queued
> - not issue a DMA command on a channel when there are PIO commands queued
> - use the alternative access to the interrupt source so that we do not
>   touch altstatus or status on shared IRQ.
> 
> Signed-off-by: Alan Cox <alan@linux.intel.com>
> ---
> 
>  drivers/ata/pata_cmd64x.c |  132 ++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 124 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/drivers/ata/pata_cmd64x.c b/drivers/ata/pata_cmd64x.c
> index f98dffe..64917ac 100644
> --- a/drivers/ata/pata_cmd64x.c
> +++ b/drivers/ata/pata_cmd64x.c
> @@ -31,7 +31,7 @@
>  #include <linux/libata.h>
>  
>  #define DRV_NAME "pata_cmd64x"
> -#define DRV_VERSION "0.2.5"
> +#define DRV_VERSION "0.3.1"
>  
>  /*
>   * CMD64x specific registers definition.
> @@ -75,6 +75,13 @@ enum {
>  	DTPR1		= 0x7C
>  };
>  
> +struct cmd_priv
> +{
> +	int dma_live;		/* Channel using DMA */
> +	int irq_t[2];		/* Register to check for IRQ */
> +	int irq_m[2];		/* Bit to check */
> +};
> +
>  static int cmd648_cable_detect(struct ata_port *ap)
>  {
>  	struct pci_dev *pdev = to_pci_dev(ap->host->dev);
> @@ -254,17 +261,107 @@ static void cmd648_bmdma_stop(struct ata_queued_cmd *qc)
>  }
>  
>  /**
> - *	cmd646r1_dma_stop	-	DMA stop callback
> + *	cmd64x_bmdma_stop	-	DMA stop callback
>   *	@qc: Command in progress
>   *
> - *	Stub for now while investigating the r1 quirk in the old driver.
> + *	Track the completion of live DMA commands and clear the dma_live
> + *	tracking flag as we do.
>   */
>  
> -static void cmd646r1_bmdma_stop(struct ata_queued_cmd *qc)
> +static void cmd64x_bmdma_stop(struct ata_queued_cmd *qc)
>  {
> +	struct ata_port *ap = qc->ap;
> +	struct cmd_priv *priv = ap->host->private_data;
>  	ata_bmdma_stop(qc);
> +	WARN_ON(priv->dma_live != ap->port_no );
> +	priv->dma_live = -1;
> +}
> +
> +/**
> + *	cmd64x_qc_defer		-	Defer logic for chip limits
> + *	@qc: queued command
> + *
> + *	Decide whether we can issue the command. Called under the host lock.
> + */
> +
> +static int cmd64x_qc_defer(struct ata_queued_cmd *qc)
> +{
> +	struct ata_host *host = qc->ap->host;
> +	struct cmd_priv *priv = host->private_data;
> +	struct ata_port *alt = host->ports[1 ^ qc->ap->port_no];
> +	int rc;
> +
> +	/* Apply the ATA rules first */
> +	rc = ata_std_qc_defer(qc);
> +	if (rc)
> +		return rc;
> +
> +	/* If the other port is not live then issue the command */
> +	if (alt == NULL || !alt->qc_active)
> +		return 0;
> +	/* If there is a live DMA command then wait */
> +	if (priv->dma_live != -1)
> +		return 	ATA_DEFER_PORT;
> +	if (qc->tf.protocol == ATAPI_PROT_DMA ||
> +				qc->tf.protocol == ATA_PROT_DMA) {
> +		/* Cannot overlap our DMA command */
> +		if (alt->qc_active)
> +			return ATA_DEFER_PORT;
> +		/* Claim the DMA */
> +		priv->dma_live = qc->ap->port_no;
> +	}
> +	return 0;
>  }
>  
> +/**
> + *	cmd64x_interrupt - ATA host interrupt handler
> + *	@irq: irq line (unused)
> + *	@dev_instance: pointer to our ata_host information structure
> + *
> + *	Our interrupt handler for PCI IDE devices.  Calls
> + *	ata_sff_host_intr() for each port that is flagging an IRQ. We cannot
> + *	use the defaults as we need to avoid touching status/altstatus during
> + *	a DMA.
> + *
> + *	LOCKING:
> + *	Obtains host lock during operation.
> + *
> + *	RETURNS:
> + *	IRQ_NONE or IRQ_HANDLED.
> + */
> +irqreturn_t cmd64x_interrupt(int irq, void *dev_instance)
> +{
> +	struct ata_host *host = dev_instance;
> +	struct pci_dev *pdev = to_pci_dev(host->dev);
> +	struct cmd_priv *priv = host->private_data;
> +	unsigned int i;
> +	unsigned int handled = 0;
> +	unsigned long flags;
> +
> +	/* TODO: make _irqsave conditional on x86 PCI IDE legacy mode */
> +	spin_lock_irqsave(&host->lock, flags);
> +
> +	for (i = 0; i < host->n_ports; i++) {
> +		struct ata_port *ap;
> +		u8 reg;
> +
> +		pci_read_config_byte(pdev, priv->irq_t[i], &reg);
> +		ap = host->ports[i];
> +		if (ap && (reg & priv->irq_m[i]) &&
> +		    !(ap->flags & ATA_FLAG_DISABLED)) {
> +			struct ata_queued_cmd *qc;
> +
> +			qc = ata_qc_from_tag(ap, ap->link.active_tag);
> +			if (qc && (!(qc->tf.flags & ATA_TFLAG_POLLING)) &&
> +			    (qc->flags & ATA_QCFLAG_ACTIVE))
> +				handled |= ata_sff_host_intr(ap, qc);
> +		}
> +	}
> +
> +	spin_unlock_irqrestore(&host->lock, flags);
> +
> +	return IRQ_RETVAL(handled);
> +}
>  static struct scsi_host_template cmd64x_sht = {
>  	ATA_BMDMA_SHT(DRV_NAME),
>  };
> @@ -273,6 +370,8 @@ static const struct ata_port_operations cmd64x_base_ops = {
>  	.inherits	= &ata_bmdma_port_ops,
>  	.set_piomode	= cmd64x_set_piomode,
>  	.set_dmamode	= cmd64x_set_dmamode,
> +	.bmdma_stop	= cmd64x_bmdma_stop,
> +	.qc_defer	= cmd64x_qc_defer,
>  };
>  
>  static struct ata_port_operations cmd64x_port_ops = {
> @@ -282,7 +381,6 @@ static struct ata_port_operations cmd64x_port_ops = {
>  
>  static struct ata_port_operations cmd646r1_port_ops = {
>  	.inherits	= &cmd64x_base_ops,
> -	.bmdma_stop	= cmd646r1_bmdma_stop,
>  	.cable_detect	= ata_cable_40wire,
>  };
>  
> @@ -290,6 +388,7 @@ static struct ata_port_operations cmd648_port_ops = {
>  	.inherits	= &cmd64x_base_ops,
>  	.bmdma_stop	= cmd648_bmdma_stop,
>  	.cable_detect	= cmd648_cable_detect,
> +	.qc_defer	= ata_std_qc_defer
>  };
>  
>  static int cmd64x_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
> @@ -340,6 +439,8 @@ static int cmd64x_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
>  	const struct ata_port_info *ppi[] = { &cmd_info[id->driver_data], NULL };
>  	u8 mrdmode;
>  	int rc;
> +	struct ata_host *host;
> +	struct cmd_priv *cpriv;
>  
>  	rc = pcim_enable_device(pdev);
>  	if (rc)
> @@ -348,6 +449,17 @@ static int cmd64x_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
>  	pci_read_config_dword(pdev, PCI_CLASS_REVISION, &class_rev);
>  	class_rev &= 0xFF;
>  
> +	cpriv = devm_kzalloc(&pdev->dev, sizeof(*cpriv), GFP_KERNEL);
> +	if (cpriv == NULL)
> +		return -ENOMEM;
> +	cpriv->dma_live = -1;
> +
> +	/* Table for IRQ checking */
> +	cpriv->irq_t[0] = CFR;
> +	cpriv->irq_m[0] = 1 << 2;
> +	cpriv->irq_t[1] = ARTTIM23;
> +	cpriv->irq_m[1] = 1 << 4;
> +
>  	if (id->driver_data == 0)	/* 643 */
>  		ata_pci_bmdma_clear_simplex(pdev);
>  
> @@ -360,20 +472,24 @@ static int cmd64x_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
>  			ppi[0] = &cmd_info[3];
>  	}
>  
> +
>  	pci_write_config_byte(pdev, PCI_LATENCY_TIMER, 64);
>  	pci_read_config_byte(pdev, MRDMODE, &mrdmode);
>  	mrdmode &= ~ 0x30;	/* IRQ set up */
>  	mrdmode |= 0x02;	/* Memory read line enable */
>  	pci_write_config_byte(pdev, MRDMODE, mrdmode);
>  
> -	/* Force PIO 0 here.. */
> -
>  	/* PPC specific fixup copied from old driver */
>  #ifdef CONFIG_PPC
>  	pci_write_config_byte(pdev, UDIDETCR0, 0xF0);
>  #endif
> +	rc = ata_pci_sff_prepare_host(pdev, ppi, &host);
> +	if (rc)
> +		return rc;
> +	host->private_data = cpriv;
>  
> -	return ata_pci_sff_init_one(pdev, ppi, &cmd64x_sht, NULL);
> +	pci_set_master(pdev);
> +	return ata_pci_sff_activate_host(host, cmd64x_interrupt, &cmd64x_sht);
>  }
>  
>  #ifdef CONFIG_PM
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD
  2009-10-25  2:48         ` Frans Pop
@ 2009-10-29 10:02           ` David Miller
  0 siblings, 0 replies; 63+ messages in thread
From: David Miller @ 2009-10-29 10:02 UTC (permalink / raw)
  To: elendil; +Cc: bzolnier, mpatocka, linux-ide

From: Frans Pop <elendil@planet.nl>
Date: Sun, 25 Oct 2009 03:48:32 +0100

> On Saturday 24 October 2009, David Miller wrote:
>> > On Thursday 22 October 2009, David Miller wrote:
>> >> The most conservative thing to do is to set the flag as
>> >> is done by mpatocka's patch but I'd like Frans to regression
>> >> test that on his ultra5.
>> >>
>> >> Frans can you do that?
>> >
>> > Sure, if you send me the patch (or a link to it). May take a few days.
>>
>> It's here:
>> http://patchwork.ozlabs.org/patch/36615/
> 
> Looks to work OK on my system. dmesg shows ide0 and ide1 are now 
> serialized.

Thanks a lot for testing Frans, I've applied Mikulas's patch
to ide-2.6

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-10-26 12:18                     ` Alan Cox
@ 2009-11-05  1:25                       ` Mikulas Patocka
  2009-11-05 10:40                         ` Alan Cox
                                           ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-11-05  1:25 UTC (permalink / raw)
  To: David Miller; +Cc: linux-ide, Alan Cox

Hi

Here is another patch for SSD for VIA UDMA33. Alan, please backport it 
into libata.

Mikulas

---

Don't use UDMA on VIA UDMA33 controller with Transcend SSD

The computer locks up if Transcend SSD runs in any of UDMA modes.
It doesn't lockup with different brand SSD, so this is specific to Transcend.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/ide/via82cxxx.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux-2.6.31.5-fast/drivers/ide/via82cxxx.c
===================================================================
--- linux-2.6.31.5-fast.orig/drivers/ide/via82cxxx.c	2009-05-29 22:36:58.000000000 +0200
+++ linux-2.6.31.5-fast/drivers/ide/via82cxxx.c	2009-11-04 22:32:55.000000000 +0100
@@ -195,6 +195,21 @@ static void via_set_pio_mode(ide_drive_t
 	via_set_drive(drive, XFER_PIO_0 + pio);
 }
 
+static u8 via_udma_filter(ide_drive_t *drive)
+{
+	char *m = (char *)&drive->id[ATA_ID_PROD];
+
+	/*
+	 * Restrict UDMA for Transcend flash cards.
+	 * On VIA 33, UDMA locks up. On VIA 133, it works. I can't test other
+	 * controllers.
+	 */
+	if (!memcmp(m, "TS", 2) && drive->hwif->ultra_mask == ATA_UDMA2)
+		return 0;
+
+	return drive->hwif->ultra_mask;
+}
+
 static struct via_isa_bridge *via_config_find(struct pci_dev **isa)
 {
 	struct via_isa_bridge *via_config;
@@ -372,6 +387,7 @@ static const struct ide_port_ops via_por
 	.set_pio_mode		= via_set_pio_mode,
 	.set_dma_mode		= via_set_drive,
 	.cable_detect		= via82cxxx_cable_detect,
+	.udma_filter		= via_udma_filter,
 };
 
 static const struct ide_port_info via82cxxx_chipset __devinitdata = {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-05  1:25                       ` [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD Mikulas Patocka
@ 2009-11-05 10:40                         ` Alan Cox
  2009-11-05 22:18                           ` Mikulas Patocka
  2009-11-17 12:30                         ` David Miller
  2010-01-14 15:49                         ` Bartlomiej Zolnierkiewicz
  2 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2009-11-05 10:40 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, linux-ide

On Wed, 4 Nov 2009 20:25:21 -0500 (EST)
Mikulas Patocka <mpatocka@redhat.com> wrote:

> Hi
> 
> Here is another patch for SSD for VIA UDMA33. Alan, please backport it 
> into libata.

I might consider adding the TS device to the blacklist on libata IFF
- I have the full ID string so I can match device accurately not blanket
  cripple hardware
- I had a much more detailed bug report - eg exact dmesg data up to the
  lock
- I know which exact controller card(s) it occurs on (rather than
  blanket "via")
- If I knew it occurred on libata
- If I knew it wasn't an IDE and/or shared IDE/libata bug or didn't
  appear to be one.

Crippling every other transcend device user on the planet for your
specific report (that nobody else has so far reported to me) would almost
certainly be a regression on a massive scale.

Does your TS device work with other controllers - do you just have a
faulty unit or marginal cabling ?

For any real failure case like this I'd expect a pile of entries in
bugzillas from multiple people. Now it could simply be yours is the first
in which case I'll look at it a blacklist when I see a few more.

Alan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-05 10:40                         ` Alan Cox
@ 2009-11-05 22:18                           ` Mikulas Patocka
  2009-11-05 22:46                             ` Alan Cox
  0 siblings, 1 reply; 63+ messages in thread
From: Mikulas Patocka @ 2009-11-05 22:18 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Miller, linux-ide, Daniela Engert



On Thu, 5 Nov 2009, Alan Cox wrote:

> On Wed, 4 Nov 2009 20:25:21 -0500 (EST)
> Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> > Hi
> > 
> > Here is another patch for SSD for VIA UDMA33. Alan, please backport it 
> > into libata.
> 
> I might consider adding the TS device to the blacklist on libata IFF
> - I have the full ID string so I can match device accurately not blanket
>   cripple hardware
> - I had a much more detailed bug report - eg exact dmesg data up to the
>   lock
> - I know which exact controller card(s) it occurs on (rather than
>   blanket "via")
> - If I knew it occurred on libata
> - If I knew it wasn't an IDE and/or shared IDE/libata bug or didn't
>   appear to be one.
> 
> Crippling every other transcend device user on the planet for your
> specific report (that nobody else has so far reported to me) would almost
> certainly be a regression on a massive scale.
> 
> Does your TS device work with other controllers - do you just have a
> faulty unit or marginal cabling ?
> 
> For any real failure case like this I'd expect a pile of entries in
> bugzillas from multiple people. Now it could simply be yours is the first
> in which case I'll look at it a blacklist when I see a few more.
> 
> Alan

The device ID string is: TS64GSSD25-M, serial 002328450083, revision V0826

The south bridge is VIA VT82C586B

The lockup happens on partition scan, sometimes the scan passes and it 
locks up when loading init. It doesn't get further.

The lockup is not dependent on IDE driver. I experienced it with Linux 2.6 
IDE, Linux 2.6 LIBATA, Daniela's OS/2 DANIS506 driver (Daniela, if you 
maintain a blacklist, you can add it there too) - switching to MWDMA 
helped. Linux 2.2 locked up too (it doesn't set transfer mode and relies 
on BIOS) and disabling UDMA in BIOS cured the problem.

I tried to switch to UDMA 0 and UDMA 1, but the lockups didn't go away, 
they just became less frequent. With MWDMA 2 it works reliably.

I tested it in a motherboard with VIA133 IDE and it works there. I think 
it also worked with that VT6421 card, but I can't test it now because I 
moved root to that Transcend SSD and it wouldn't boot of that.

This is the list of PCI devices for the detailed information (note that 
onboard IDE locks-up, not that additional VT6421 controller on a PCI 
card).

Mikulas

BUS 0, DEV 0, FN 0
1106 0597 0000 0000 06 00 00 03
        Bridge - Host bridge
        VIA Technologies, Inc. - VT82C597 [Apollo VP3]
BUS 0, DEV 1, FN 0
1106 8598 0000 0000 06 04 00 00
        Bridge - PCI bridge - Normal decode
        VIA Technologies, Inc. - VT82C598/694x [Apollo MVP3/Pro133x AGP]
BUS 0, DEV 7, FN 0
1106 0586 0000 0000 06 01 00 41
        Bridge - ISA bridge
        VIA Technologies, Inc. - VT82C586/A/B PCI-to-ISA [Apollo VP]
BUS 0, DEV 7, FN 1 (CLAIMED)
1106 0571 0000 0000 01 01 8A 06
        Mass storage controller - IDE interface
        VIA Technologies, Inc. - VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC 
Bus Master IDE
BUS 0, DEV 7, FN 2 (CLAIMED)
1106 3038 0925 1234 0C 03 00 02
        Serial bus controller - USB Controller - UHCI
        VIA Technologies, Inc. - VT82xxxxx UHCI USB 1.1 Controller - USB 
Controller
BUS 0, DEV 7, FN 3
1106 3040 0000 0000 06 04 00 10
        Bridge - PCI bridge - Normal decode
        VIA Technologies, Inc. - VT82C586B ACPI
BUS 0, DEV 8, FN 0 (CLAIMED)
1101 1060 1101 1060 01 00 00 01
        Mass storage controller - SCSI storage controller
        Initio Corporation - INI-A100U2W
BUS 0, DEV 9, FN 0 (CLAIMED)
1106 3038 1106 3038 0C 03 00 62
        Serial bus controller - USB Controller - UHCI
        VIA Technologies, Inc. - VT82xxxxx UHCI USB 1.1 Controller
BUS 0, DEV 9, FN 1 (CLAIMED)
1106 3038 1106 3038 0C 03 00 62
        Serial bus controller - USB Controller - UHCI
        VIA Technologies, Inc. - VT82xxxxx UHCI USB 1.1 Controller
BUS 0, DEV 9, FN 2
1106 3104 1106 3104 0C 03 20 65
        Serial bus controller - USB Controller - EHCI
        VIA Technologies, Inc. - USB 2.0
BUS 0, DEV 9, FN 3 (CLAIMED)
1106 3249 1106 3249 01 04 00 50
        Mass storage controller - RAID bus controller
        VIA Technologies, Inc. - VT6421 IDE RAID Controller
BUS 0, DEV 10, FN 0 (CLAIMED)
8086 1076 8086 1176 02 00 00 00
        Network controller - Ethernet controller
        Intel Corporation - 82541GI Gigabit Ethernet Controller - PRO/1000 
MT Desktop Adapter
BUS 1, DEV 0, FN 0 (CLAIMED)
5333 8A13 5333 8A13 03 00 00 02
        Display controller - VGA compatible controller - VGA
        S3 Inc. - 86c368 [Trio 3D/2X] - Trio3D/2X


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-05 22:18                           ` Mikulas Patocka
@ 2009-11-05 22:46                             ` Alan Cox
  2009-11-05 23:19                               ` Mikulas Patocka
  0 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2009-11-05 22:46 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, linux-ide, Daniela Engert

> The device ID string is: TS64GSSD25-M, serial 002328450083, revision V0826
> The south bridge is VIA VT82C586B
> 
> The lockup happens on partition scan, sometimes the scan passes and it 
> locks up when loading init. It doesn't get further.

The Linux driver doesn't set all the time setup strictly but it does set
it to be a safer maximum value. That might be worth checking but it would
still be a device naughty if so.

> The lockup is not dependent on IDE driver. I experienced it with Linux 2.6 
> IDE, Linux 2.6 LIBATA, Daniela's OS/2 DANIS506 driver (Daniela, if you 
> maintain a blacklist, you can add it there too) - switching to MWDMA 
> helped. Linux 2.2 locked up too (it doesn't set transfer mode and relies 
> on BIOS) and disabling UDMA in BIOS cured the problem.

> I tested it in a motherboard with VIA133 IDE and it works there. I think 
> it also worked with that VT6421 card, but I can't test it now because I 
> moved root to that Transcend SSD and it wouldn't boot of that.

Thanks - so basically its a specific card/device combination (or a dud
device you own), that makes it easier as any blacklisting can be very
much narrower.

> 1106 8598 0000 0000 06 04 00 00
>         Bridge - PCI bridge - Normal decode
>         VIA Technologies, Inc. - VT82C598/694x [Apollo MVP3/Pro133x AGP]

Ok so its also a dinosaur which might explain the lack of other reports.


Thanks

Alan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-05 22:46                             ` Alan Cox
@ 2009-11-05 23:19                               ` Mikulas Patocka
  0 siblings, 0 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-11-05 23:19 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Miller, linux-ide, Daniela Engert



On Thu, 5 Nov 2009, Alan Cox wrote:

> > The device ID string is: TS64GSSD25-M, serial 002328450083, revision V0826
> > The south bridge is VIA VT82C586B
> > 
> > The lockup happens on partition scan, sometimes the scan passes and it 
> > locks up when loading init. It doesn't get further.
> 
> The Linux driver doesn't set all the time setup strictly but it does set
> it to be a safer maximum value. That might be worth checking but it would
> still be a device naughty if so.

I tried UDMA 0 and 1 and no help. I didn't try to fiddle the timing 
register manually. You mean, try set the disk to UDMA 2 and set the 
register to a lower timing? Or the other way? I can try ...

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-05  1:25                       ` [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD Mikulas Patocka
  2009-11-05 10:40                         ` Alan Cox
@ 2009-11-17 12:30                         ` David Miller
  2009-11-18 17:09                           ` Mikulas Patocka
  2010-01-14 15:49                         ` Bartlomiej Zolnierkiewicz
  2 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2009-11-17 12:30 UTC (permalink / raw)
  To: mpatocka; +Cc: linux-ide, alan

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed, 4 Nov 2009 20:25:21 -0500 (EST)

> Don't use UDMA on VIA UDMA33 controller with Transcend SSD
> 
> The computer locks up if Transcend SSD runs in any of UDMA modes.
> It doesn't lockup with different brand SSD, so this is specific to Transcend.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Mikulas, I'm happy to apply this if you match on the full
ID string, not just "TS".

Please update your patch and I'll push it upstream.

Thank you.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-17 12:30                         ` David Miller
@ 2009-11-18 17:09                           ` Mikulas Patocka
  2009-11-18 17:22                             ` Alan Cox
  2011-10-11 17:12                             ` Bartlomiej Zolnierkiewicz
  0 siblings, 2 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-11-18 17:09 UTC (permalink / raw)
  To: David Miller; +Cc: linux-ide, alan

On Tue, 17 Nov 2009, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Wed, 4 Nov 2009 20:25:21 -0500 (EST)
> 
> > Don't use UDMA on VIA UDMA33 controller with Transcend SSD
> > 
> > The computer locks up if Transcend SSD runs in any of UDMA modes.
> > It doesn't lockup with different brand SSD, so this is specific to Transcend.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> Mikulas, I'm happy to apply this if you match on the full
> ID string, not just "TS".
> 
> Please update your patch and I'll push it upstream.
> 
> Thank you.

Transcend makes various versions of their SSDs and all begin with TS. You 
can assume that Transcend SSDs with different capacity or format won't 
work too because they likely use the same controller. The naming is this:

TS64GSSD25-M
TS: Transcend
64G: capacity
SSD25: 2.5" format
-M: MLC

So the problem is that if you match against the full string, you are going 
to miss the other Transcend devices and the patch becomes quite useless.

If you want to harden it against false negatives, you can grep the string 
for "SSD", as in the patch below, but there is not anything better to do 
--- if you include "64G" in the string, you fail on non-64G devices, if 
you include "SSD25", you fail on 1.8" devices, if you include "-M", you 
fail on SLC.

Mikulas

---

Don't use UDMA on VIA UDMA33 controller with Transcend SSD

The computer locks up after if Transcend SSD runs in any of UDMA modes.
It doesn't lockup with different brand SSD, so this is specific to Transcend.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/ide/via82cxxx.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Index: linux-2.6.31.6-fast/drivers/ide/via82cxxx.c
===================================================================
--- linux-2.6.31.6-fast.orig/drivers/ide/via82cxxx.c	2009-11-16 13:08:04.000000000 +0100
+++ linux-2.6.31.6-fast/drivers/ide/via82cxxx.c	2009-11-18 17:57:22.000000000 +0100
@@ -195,6 +195,22 @@ static void via_set_pio_mode(ide_drive_t
 	via_set_drive(drive, XFER_PIO_0 + pio);
 }
 
+static u8 via_udma_filter(ide_drive_t *drive)
+{
+	char *m = (char *)&drive->id[ATA_ID_PROD];
+
+	/*
+	 * Restrict UDMA for Transcend flash cards.
+	 * On VIA 33, UDMA locks up. On VIA 133, it works. I can't test other
+	 * controllers.
+	 */
+	if (!memcmp(m, "TS", 2) && strstr(m, "SSD") &&
+	    drive->hwif->ultra_mask == ATA_UDMA2)
+		return 0;
+
+	return drive->hwif->ultra_mask;
+}
+
 static struct via_isa_bridge *via_config_find(struct pci_dev **isa)
 {
 	struct via_isa_bridge *via_config;
@@ -372,6 +388,7 @@ static const struct ide_port_ops via_por
 	.set_pio_mode		= via_set_pio_mode,
 	.set_dma_mode		= via_set_drive,
 	.cable_detect		= via82cxxx_cable_detect,
+	.udma_filter		= via_udma_filter,
 };
 
 static const struct ide_port_info via82cxxx_chipset __devinitdata = {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:09                           ` Mikulas Patocka
@ 2009-11-18 17:22                             ` Alan Cox
  2009-11-18 17:32                               ` David Miller
  2009-11-18 17:37                               ` Mikulas Patocka
  2011-10-11 17:12                             ` Bartlomiej Zolnierkiewicz
  1 sibling, 2 replies; 63+ messages in thread
From: Alan Cox @ 2009-11-18 17:22 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, linux-ide

> Transcend makes various versions of their SSDs and all begin with TS. You 
> can assume that Transcend SSDs with different capacity or format won't 
> work too because they likely use the same controller. The naming is this:

You don't want to assume that because then you will never find out if
there is no problem.

> So the problem is that if you match against the full string, you are going 
> to miss the other Transcend devices and the patch becomes quite useless.

Quite the reverse. It's not implausible that only one or two devices are
affected, but match TS SSD is going to match zillions of devices of many
generations for years to come. Thats not good.

Experience is also that very few of the existing blacklist entries we
have grow to be large scale wildcards.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:22                             ` Alan Cox
@ 2009-11-18 17:32                               ` David Miller
  2009-11-18 17:46                                 ` Mikulas Patocka
  2009-11-18 17:37                               ` Mikulas Patocka
  1 sibling, 1 reply; 63+ messages in thread
From: David Miller @ 2009-11-18 17:32 UTC (permalink / raw)
  To: alan; +Cc: mpatocka, linux-ide

From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Wed, 18 Nov 2009 17:22:58 +0000

>> Transcend makes various versions of their SSDs and all begin with TS. You 
>> can assume that Transcend SSDs with different capacity or format won't 
>> work too because they likely use the same controller. The naming is this:
> 
> You don't want to assume that because then you will never find out if
> there is no problem.

Agreed.

And if Alan's patch for the ATA layer is going to use an exact string
match, which it is, and we should do the same in the IDE layer to be
consistent.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:22                             ` Alan Cox
  2009-11-18 17:32                               ` David Miller
@ 2009-11-18 17:37                               ` Mikulas Patocka
  2009-11-18 17:50                                 ` Alan Cox
  1 sibling, 1 reply; 63+ messages in thread
From: Mikulas Patocka @ 2009-11-18 17:37 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Miller, linux-ide

> > Transcend makes various versions of their SSDs and all begin with TS. You 
> > can assume that Transcend SSDs with different capacity or format won't 
> > work too because they likely use the same controller. The naming is this:
> 
> You don't want to assume that because then you will never find out if
> there is no problem.

I can be certain that no company would be stupid enough to design a 
separate controller for each capacity and form factor of their devices. 
They use one or a very few chips.

Furthemore, even if they upgrade the chip and produce 64G 2.5" MLC device 
with the new working chip, you can't tell it from the ID string at all.

> > So the problem is that if you match against the full string, you are going 
> > to miss the other Transcend devices and the patch becomes quite useless.
> 
> Quite the reverse. It's not implausible that only one or two devices are
> affected, but match TS SSD is going to match zillions of devices of many
> generations for years to come. Thats not good.

Why no good? People just get lower performance (on an old motherboard, 
where no one expects high performance anyway).

On the other hand, if you miss a device from the blacklist, people get 
crashes.

Crashes are worse than lower performance.

> Experience is also that very few of the existing blacklist entries we
> have grow to be large scale wildcards.

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:32                               ` David Miller
@ 2009-11-18 17:46                                 ` Mikulas Patocka
  2009-11-18 17:53                                   ` David Miller
  0 siblings, 1 reply; 63+ messages in thread
From: Mikulas Patocka @ 2009-11-18 17:46 UTC (permalink / raw)
  To: David Miller; +Cc: alan, linux-ide



On Wed, 18 Nov 2009, David Miller wrote:

> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> Date: Wed, 18 Nov 2009 17:22:58 +0000
> 
> >> Transcend makes various versions of their SSDs and all begin with TS. You 
> >> can assume that Transcend SSDs with different capacity or format won't 
> >> work too because they likely use the same controller. The naming is this:
> > 
> > You don't want to assume that because then you will never find out if
> > there is no problem.
> 
> Agreed.
> 
> And if Alan's patch for the ATA layer is going to use an exact string
> match, which it is, and we should do the same in the IDE layer to be
> consistent.

And what will you do with 16G or 32G versions of the same device? You 
really think they use different controllers and are not affected?

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:37                               ` Mikulas Patocka
@ 2009-11-18 17:50                                 ` Alan Cox
  2009-11-18 18:02                                   ` Mikulas Patocka
  0 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2009-11-18 17:50 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, linux-ide

> I can be certain that no company would be stupid enough to design a 
> separate controller for each capacity and form factor of their devices. 
> They use one or a very few chips.

But they do fix the firmware, and you see that with hard disks as well.

> Why no good? People just get lower performance (on an old motherboard, 
> where no one expects high performance anyway).

Which they don't notice or report
> 
> On the other hand, if you miss a device from the blacklist, people get 
> crashes.

Which they do notice or report (and in the meantime can boot with
libata.dma=0 or one of the MWDMA settings).

> Crashes are worse than lower performance.

Not always. Inflicting poor performance (and also *much* lower data
integrity) on people isn't a good idea in this case IMHO. That's based
upon experience with disks and what gets fixed.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:46                                 ` Mikulas Patocka
@ 2009-11-18 17:53                                   ` David Miller
  2009-11-18 18:04                                     ` Mikulas Patocka
  0 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2009-11-18 17:53 UTC (permalink / raw)
  To: mpatocka; +Cc: alan, linux-ide

From: Mikulas Patocka <mpatocka@redhat.com>
Date: Wed, 18 Nov 2009 12:46:37 -0500 (EST)

> And what will you do with 16G or 32G versions of the same device? You 
> really think they use different controllers and are not affected?

Firmware changes often.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:50                                 ` Alan Cox
@ 2009-11-18 18:02                                   ` Mikulas Patocka
  0 siblings, 0 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-11-18 18:02 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Miller, linux-ide

On Wed, 18 Nov 2009, Alan Cox wrote:

> > I can be certain that no company would be stupid enough to design a 
> > separate controller for each capacity and form factor of their devices. 
> > They use one or a very few chips.
> 
> But they do fix the firmware, and you see that with hard disks as well.

So match it against a firmware revision (V0826 in my case, so blacklist 
all lower).

Anyway, I think it is bug on the motherboard because similar lockups can 
be seen with one WD 40G disk operating at UDMA0 or UDMA1 (UDMA2 is OK). 
And if this is old buggy motherboard, Transcend will fix nothing.

> > Why no good? People just get lower performance (on an old motherboard, 
> > where no one expects high performance anyway).
> 
> Which they don't notice or report

If they don't notice it, it is not a problem :)

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:53                                   ` David Miller
@ 2009-11-18 18:04                                     ` Mikulas Patocka
  0 siblings, 0 replies; 63+ messages in thread
From: Mikulas Patocka @ 2009-11-18 18:04 UTC (permalink / raw)
  To: David Miller; +Cc: alan, linux-ide

On Wed, 18 Nov 2009, David Miller wrote:

> From: Mikulas Patocka <mpatocka@redhat.com>
> Date: Wed, 18 Nov 2009 12:46:37 -0500 (EST)
> 
> > And what will you do with 16G or 32G versions of the same device? You 
> > really think they use different controllers and are not affected?
> 
> Firmware changes often.

So use the firmware revision field to match it.

Mikulas

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-05  1:25                       ` [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD Mikulas Patocka
  2009-11-05 10:40                         ` Alan Cox
  2009-11-17 12:30                         ` David Miller
@ 2010-01-14 15:49                         ` Bartlomiej Zolnierkiewicz
  2010-01-14 19:24                           ` Alan Cox
  2 siblings, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2010-01-14 15:49 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, linux-ide, Alan Cox

On Thursday 05 November 2009 02:25:21 am Mikulas Patocka wrote:
> Hi
> 
> Here is another patch for SSD for VIA UDMA33. Alan, please backport it 
> into libata.
> 
> Mikulas
> 
> ---
> 
> Don't use UDMA on VIA UDMA33 controller with Transcend SSD
> 
> The computer locks up if Transcend SSD runs in any of UDMA modes.
> It doesn't lockup with different brand SSD, so this is specific to Transcend.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

FWIW the following modified version of your patch is now in my atang tree:

From: Mikulas Patocka <mpatocka@redhat.com>
Subject: [PATCH] via82cxxx: don't use UDMA on VIA UDMA33 controller with Transcend SSD

Don't use UDMA on VIA UDMA33 controller with Transcend SSD.

The computer locks up if Transcend SSD runs in any of UDMA modes.
It doesn't lockup with different brand SSD, so this is specific to Transcend.

bzolnier:
- limit it to VT82C586A/B + TS64GSSD25-M (per commit 10734fc) for now
- add warning message

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
---
 drivers/ide/via82cxxx.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Index: b/drivers/ide/via82cxxx.c
===================================================================
--- a/drivers/ide/via82cxxx.c
+++ b/drivers/ide/via82cxxx.c
@@ -195,6 +195,24 @@ static void via_set_pio_mode(ide_drive_t
 	via_set_drive(drive, XFER_PIO_0 + pio);
 }
 
+static u8 via_udma_filter(ide_drive_t *drive)
+{
+	ide_hwif_t *hwif = drive->hwif;
+	struct pci_dev *dev = to_pci_dev(hwif->dev);
+	struct ide_host *host = pci_get_drvdata(dev);
+	struct via82cxxx_dev *vdev = host->host_priv;
+	char *m = (char *)&drive->id[ATA_ID_PROD];
+
+	if (vdev->via_config->id == PCI_DEVICE_ID_VIA_82C586_0 &&
+	    strcmp(m, "TS64GSSD25-M") == 0) {
+		printk(KERN_WARNING "%s: disabling UDMA mode due to reported "
+			"lockups with this device.\n", drive->name);
+		return 0;
+	}
+
+	return hwif->ultra_mask;
+}
+
 static struct via_isa_bridge *via_config_find(struct pci_dev **isa)
 {
 	struct via_isa_bridge *via_config;
@@ -372,6 +390,7 @@ static const struct ide_port_ops via_por
 	.set_pio_mode		= via_set_pio_mode,
 	.set_dma_mode		= via_set_drive,
 	.cable_detect		= via82cxxx_cable_detect,
+	.udma_filter		= via_udma_filter,
 };
 
 static const struct ide_port_info via82cxxx_chipset __devinitdata = {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2010-01-14 15:49                         ` Bartlomiej Zolnierkiewicz
@ 2010-01-14 19:24                           ` Alan Cox
  2010-01-14 20:17                             ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2010-01-14 19:24 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: Mikulas Patocka, David Miller, linux-ide

> The computer locks up if Transcend SSD runs in any of UDMA modes.
> It doesn't lockup with different brand SSD, so this is specific to Transcend.
> 
> bzolnier:
> - limit it to VT82C586A/B + TS64GSSD25-M (per commit 10734fc) for now
> - add warning message

Looks good to me.


> +		printk(KERN_WARNING "%s: disabling UDMA mode due to reported "
> +			"lockups with this device.\n", drive->name);

That sounds odd - I think I'd have put ": not using UDMA mode as lockups
have been reported with this device" or similar.

Ackity-ack

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2010-01-14 19:24                           ` Alan Cox
@ 2010-01-14 20:17                             ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2010-01-14 20:17 UTC (permalink / raw)
  To: Alan Cox; +Cc: Mikulas Patocka, David Miller, linux-ide

On Thursday 14 January 2010 08:24:10 pm Alan Cox wrote:
> > The computer locks up if Transcend SSD runs in any of UDMA modes.
> > It doesn't lockup with different brand SSD, so this is specific to Transcend.
> > 
> > bzolnier:
> > - limit it to VT82C586A/B + TS64GSSD25-M (per commit 10734fc) for now
> > - add warning message
> 
> Looks good to me.

Thanks.

> > +		printk(KERN_WARNING "%s: disabling UDMA mode due to reported "
> > +			"lockups with this device.\n", drive->name);
> 
> That sounds odd - I think I'd have put ": not using UDMA mode as lockups
> have been reported with this device" or similar.

For compatibility reasons the warning message has been kept identical to
the one used by pata_via driver (please see commit 10734fc for details 8)..

--
Bartlomiej Zolnierkiewicz

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2009-11-18 17:09                           ` Mikulas Patocka
  2009-11-18 17:22                             ` Alan Cox
@ 2011-10-11 17:12                             ` Bartlomiej Zolnierkiewicz
  2011-10-11 19:05                               ` David Miller
  1 sibling, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2011-10-11 17:12 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: David Miller, linux-ide, alan

Mikulas Patocka wrote:

> On Tue, 17 Nov 2009, David Miller wrote:
> 
> > From: Mikulas Patocka <mpatocka@redhat.com>
> > Date: Wed, 4 Nov 2009 20:25:21 -0500 (EST)
> > 
> > > Don't use UDMA on VIA UDMA33 controller with Transcend SSD
> > > 
> > > The computer locks up if Transcend SSD runs in any of UDMA modes.
> > > It doesn't lockup with different brand SSD, so this is specific to Transcend.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> > Mikulas, I'm happy to apply this if you match on the full
> > ID string, not just "TS".
> > 
> > Please update your patch and I'll push it upstream.
> > 
> > Thank you.
> 
> Transcend makes various versions of their SSDs and all begin with TS. You 
> can assume that Transcend SSDs with different capacity or format won't 
> work too because they likely use the same controller. The naming is this:
> 
> TS64GSSD25-M
> TS: Transcend
> 64G: capacity
> SSD25: 2.5" format
> -M: MLC
> 
> So the problem is that if you match against the full string, you are going 
> to miss the other Transcend devices and the patch becomes quite useless.
> 
> If you want to harden it against false negatives, you can grep the string 
> for "SSD", as in the patch below, but there is not anything better to do 
> --- if you include "64G" in the string, you fail on non-64G devices, if 
> you include "SSD25", you fail on 1.8" devices, if you include "-M", you 
> fail on SLC.
> 
> Mikulas
> 
> ---
> 
> Don't use UDMA on VIA UDMA33 controller with Transcend SSD
> 
> The computer locks up after if Transcend SSD runs in any of UDMA modes.
> It doesn't lockup with different brand SSD, so this is specific to Transcend.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Reviewed-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

Still valid for 3.1.  Dave, ping?

> ---
>  drivers/ide/via82cxxx.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> Index: linux-2.6.31.6-fast/drivers/ide/via82cxxx.c
> ===================================================================
> --- linux-2.6.31.6-fast.orig/drivers/ide/via82cxxx.c	2009-11-16 13:08:04.000000000 +0100
> +++ linux-2.6.31.6-fast/drivers/ide/via82cxxx.c	2009-11-18 17:57:22.000000000 +0100
> @@ -195,6 +195,22 @@ static void via_set_pio_mode(ide_drive_t
>  	via_set_drive(drive, XFER_PIO_0 + pio);
>  }
>  
> +static u8 via_udma_filter(ide_drive_t *drive)
> +{
> +	char *m = (char *)&drive->id[ATA_ID_PROD];
> +
> +	/*
> +	 * Restrict UDMA for Transcend flash cards.
> +	 * On VIA 33, UDMA locks up. On VIA 133, it works. I can't test other
> +	 * controllers.
> +	 */
> +	if (!memcmp(m, "TS", 2) && strstr(m, "SSD") &&
> +	    drive->hwif->ultra_mask == ATA_UDMA2)
> +		return 0;
> +
> +	return drive->hwif->ultra_mask;
> +}
> +
>  static struct via_isa_bridge *via_config_find(struct pci_dev **isa)
>  {
>  	struct via_isa_bridge *via_config;
> @@ -372,6 +388,7 @@ static const struct ide_port_ops via_por
>  	.set_pio_mode		= via_set_pio_mode,
>  	.set_dma_mode		= via_set_drive,
>  	.cable_detect		= via82cxxx_cable_detect,
> +	.udma_filter		= via_udma_filter,
>  };
>  
>  static const struct ide_port_info via82cxxx_chipset __devinitdata = {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2011-10-11 17:12                             ` Bartlomiej Zolnierkiewicz
@ 2011-10-11 19:05                               ` David Miller
  2011-10-11 19:39                                 ` Alan Cox
  0 siblings, 1 reply; 63+ messages in thread
From: David Miller @ 2011-10-11 19:05 UTC (permalink / raw)
  To: bzolnier; +Cc: mpatocka, linux-ide, alan

From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date: Tue, 11 Oct 2011 19:12:01 +0200

> Still valid for 3.1.  Dave, ping?

Feedback given and changes requested/required, patch was not updated
and resubmitted by the author:

http://patchwork.ozlabs.org/patch/38764/

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2011-10-11 19:05                               ` David Miller
@ 2011-10-11 19:39                                 ` Alan Cox
  2011-10-12 14:38                                   ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2011-10-11 19:39 UTC (permalink / raw)
  To: David Miller; +Cc: bzolnier, mpatocka, linux-ide

On Tue, 11 Oct 2011 15:05:04 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date: Tue, 11 Oct 2011 19:12:01 +0200
> 
> > Still valid for 3.1.  Dave, ping?
> 
> Feedback given and changes requested/required, patch was not updated
> and resubmitted by the author:
> 
> http://patchwork.ozlabs.org/patch/38764/

The last state on this I saw it seemd that exactly one person had seen it
on exactly one machine ? Has that changed ?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2011-10-11 19:39                                 ` Alan Cox
@ 2011-10-12 14:38                                   ` Bartlomiej Zolnierkiewicz
  2011-10-12 17:59                                     ` Alan Cox
  0 siblings, 1 reply; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2011-10-12 14:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Miller, mpatocka, linux-ide

Alan Cox wrote:

> On Tue, 11 Oct 2011 15:05:04 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > Date: Tue, 11 Oct 2011 19:12:01 +0200
> > 
> > > Still valid for 3.1.  Dave, ping?
> > 
> > Feedback given and changes requested/required, patch was not updated
> > and resubmitted by the author:
> > 
> > http://patchwork.ozlabs.org/patch/38764/
> 
> The last state on this I saw it seemd that exactly one person had seen it
> on exactly one machine ? Has that changed ?

No, you're right here.

Here is updated patch:

From: Mikulas Patocka <mpatocka@redhat.com>
Subject: [PATCH] via82cxxx: don't use UDMA on VIA UDMA33 controller with Transcend SSD

Don't use UDMA on VIA UDMA33 controller with Transcend SSD.

The computer locks up if Transcend SSD runs in any of UDMA modes.
It doesn't lockup with different brand SSD, so this is specific to Transcend.

bzolnier:
- limit it to VT82C586A/B + TS64GSSD25-M (per commit 10734fc ("pata_via:
  Blacklist some combinations of Transcend Flash and via")) for now
- add warning message

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
---
 drivers/ide/via82cxxx.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Index: b/drivers/ide/via82cxxx.c
===================================================================
--- a/drivers/ide/via82cxxx.c
+++ b/drivers/ide/via82cxxx.c
@@ -220,6 +220,24 @@ static void via_set_pio_mode(ide_hwif_t
 	via_set_drive(hwif, drive);
 }
 
+static u8 via_udma_filter(ide_drive_t *drive)
+{
+	ide_hwif_t *hwif = drive->hwif;
+	struct pci_dev *dev = to_pci_dev(hwif->dev);
+	struct ide_host *host = pci_get_drvdata(dev);
+	struct via82cxxx_dev *vdev = host->host_priv;
+	char *m = (char *)&drive->id[ATA_ID_PROD];
+
+	if (vdev->via_config->id == PCI_DEVICE_ID_VIA_82C586_0 &&
+	    strcmp(m, "TS64GSSD25-M") == 0) {
+		printk(KERN_WARNING "%s: disabling UDMA mode due to reported "
+			"lockups with this device.\n", drive->name);
+		return 0;
+	}
+
+	return hwif->ultra_mask;
+}
+
 static struct via_isa_bridge *via_config_find(struct pci_dev **isa)
 {
 	struct via_isa_bridge *via_config;
@@ -401,6 +419,7 @@ static const struct ide_port_ops via_por
 	.set_pio_mode		= via_set_pio_mode,
 	.set_dma_mode		= via_set_drive,
 	.cable_detect		= via82cxxx_cable_detect,
+	.udma_filter		= via_udma_filter,
 };
 
 static const struct ide_port_info via82cxxx_chipset __devinitdata = {

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2011-10-12 14:38                                   ` Bartlomiej Zolnierkiewicz
@ 2011-10-12 17:59                                     ` Alan Cox
  2011-10-13 10:35                                       ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2011-10-12 17:59 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: David Miller, mpatocka, linux-ide

> > The last state on this I saw it seemd that exactly one person had seen it
> > on exactly one machine ? Has that changed ?
> 
> No, you're right here.

So as far as is known its a weird glitch on one persons box with one
device with one firmware that's not been replicated ?

That's not a patch candidate really is it.

Alan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD
  2011-10-12 17:59                                     ` Alan Cox
@ 2011-10-13 10:35                                       ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 63+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2011-10-13 10:35 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Miller, mpatocka, linux-ide

Alan Cox wrote:

> > > The last state on this I saw it seemd that exactly one person had seen it
> > > on exactly one machine ? Has that changed ?
> > 
> > No, you're right here.
> 
> So as far as is known its a weird glitch on one persons box with one
> device with one firmware that's not been replicated ?
> 
> That's not a patch candidate really is it.

Should we revert following commit then?

commit 10734fc8d5fbf89e88519d72e58cce83be21941a
Author: Alan Cox <alan@linux.intel.com>
Date:   Mon Nov 30 13:22:43 2009 +0000

    pata_via: Blacklist some combinations of Transcend Flash and via
    
    Reported by Mikulas Patocka.
    
    VIA VT82C586B + Transcend TS64GSSD25-M v0826 does not work in UDMA mode
    
    Signed-off-by: Alan Cox <alan@linux.intel.com>
    Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2011-10-13 10:42 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-21 18:55 [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD Mikulas Patocka
2009-10-21 19:34 ` Mikael Pettersson
2009-10-21 23:01   ` Mikulas Patocka
2009-10-27 11:34     ` Alan Cox
2009-10-28  1:10       ` Mikulas Patocka
2009-10-21 19:39 ` Bartlomiej Zolnierkiewicz
2009-10-22  0:41   ` David Miller
2009-10-22  9:44     ` Bartlomiej Zolnierkiewicz
2009-10-22 11:00       ` David Miller
2009-10-22 11:15         ` Bartlomiej Zolnierkiewicz
2009-10-22 11:20           ` David Miller
2009-10-23 14:29       ` Mikulas Patocka
2009-10-23 14:31         ` David Miller
2009-10-23 14:44         ` Bartlomiej Zolnierkiewicz
2009-10-23 14:55           ` Mikulas Patocka
2009-10-23 15:03             ` Bartlomiej Zolnierkiewicz
2009-10-23 15:18               ` Daniela Engert
2009-10-23 16:51             ` Alan Cox
2009-10-23 17:27               ` Sergei Shtylyov
2009-10-23 18:22                 ` Alan Cox
2009-10-23 18:52                   ` Bartlomiej Zolnierkiewicz
2009-10-24  3:24                     ` David Miller
2009-10-24 12:38                       ` Bartlomiej Zolnierkiewicz
2009-10-24 12:58                         ` David Miller
2009-10-24 13:13                           ` Bartlomiej Zolnierkiewicz
2009-10-24 13:20                             ` David Miller
2009-10-26 11:36                   ` Mikulas Patocka
2009-10-26 12:18                     ` Alan Cox
2009-11-05  1:25                       ` [PATCH] Don't use UDMA on VIA UDMA33 controller with Transcend SSD Mikulas Patocka
2009-11-05 10:40                         ` Alan Cox
2009-11-05 22:18                           ` Mikulas Patocka
2009-11-05 22:46                             ` Alan Cox
2009-11-05 23:19                               ` Mikulas Patocka
2009-11-17 12:30                         ` David Miller
2009-11-18 17:09                           ` Mikulas Patocka
2009-11-18 17:22                             ` Alan Cox
2009-11-18 17:32                               ` David Miller
2009-11-18 17:46                                 ` Mikulas Patocka
2009-11-18 17:53                                   ` David Miller
2009-11-18 18:04                                     ` Mikulas Patocka
2009-11-18 17:37                               ` Mikulas Patocka
2009-11-18 17:50                                 ` Alan Cox
2009-11-18 18:02                                   ` Mikulas Patocka
2011-10-11 17:12                             ` Bartlomiej Zolnierkiewicz
2011-10-11 19:05                               ` David Miller
2011-10-11 19:39                                 ` Alan Cox
2011-10-12 14:38                                   ` Bartlomiej Zolnierkiewicz
2011-10-12 17:59                                     ` Alan Cox
2011-10-13 10:35                                       ` Bartlomiej Zolnierkiewicz
2010-01-14 15:49                         ` Bartlomiej Zolnierkiewicz
2010-01-14 19:24                           ` Alan Cox
2010-01-14 20:17                             ` Bartlomiej Zolnierkiewicz
2009-10-23 17:15         ` [PATCH] Serialize CMD643 and CMD646 to fix a hardware bug with SSD Alan Cox
2009-10-22 13:56     ` Alan Cox
2009-10-23  1:30       ` David Miller
2009-10-23 14:50     ` Mikulas Patocka
2009-10-23 20:50       ` Sergei Shtylyov
2009-10-26 11:30         ` Mikulas Patocka
2009-10-26 18:20           ` Sergei Shtylyov
2009-10-24 11:28     ` Frans Pop
2009-10-24 11:31       ` David Miller
2009-10-25  2:48         ` Frans Pop
2009-10-29 10:02           ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.