linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SCSI or libata problem with an RDX removable disk
@ 2008-09-04  9:54 Pascal GREGIS
  2008-09-04 11:34 ` Alan Cox
  0 siblings, 1 reply; 7+ messages in thread
From: Pascal GREGIS @ 2008-09-04  9:54 UTC (permalink / raw)
  To: linux-kernel

Hello everyone,

I turn to you with a new problem about scsi or libata.
I have a removable disk of type RDX, connected by SATA to my linux machine.
I access it via libata on /dev/sdb, like a classical disk.
I write to it a certain time with some mounts and umounts.
After a certain number of mounting and unmounting, or maybe a certain number of hours (not clearly identified), the device refuses to be mounted, saying "You must specify the filesystem type.".
Looking to /var/log/messages I have the following logs :


Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] READ CAPACITY failed
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Sense not available.
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Write Protect is off
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Mode Sense: 00 00 00 00
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Asking for cache data failed
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Assuming drive cache: write through
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] READ CAPACITY failed
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Sense not available.
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Write Protect is off
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Mode Sense: 00 00 00 00
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Asking for cache data failed
Jul 15 14:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Assuming drive cache: write through

These lines are logged again when I run cfdisk or parted or hdparm or anything
on /dev/sdb.

My system is :
linux kernel 2.6.21.1 with some patches :
- libata-start_stop_management (http://bugs.gentoo.org/attachment.cgi?id=118829)

compiled with libata.

Motherboard ICH6 family (id 2651)

Does anyone know this problem or have an idea about what it is and what to do?

Thank you in advance

Pascal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: SCSI or libata problem with an RDX removable disk
  2008-09-04  9:54 SCSI or libata problem with an RDX removable disk Pascal GREGIS
@ 2008-09-04 11:34 ` Alan Cox
  2008-09-04 13:52   ` Pascal GREGIS
  2008-09-08  8:19   ` Pascal GREGIS
  0 siblings, 2 replies; 7+ messages in thread
From: Alan Cox @ 2008-09-04 11:34 UTC (permalink / raw)
  To: Pascal GREGIS; +Cc: linux-kernel

> Looking to /var/log/messages I have the following logs :

I need the logs where it fails not afterwards - the ones showing the
drive error and where the fs gives up. Without that it is hard to see
what happened.

> My system is :
> linux kernel 2.6.21.1 with some patches :
> - libata-start_stop_management (http://bugs.gentoo.org/attachment.cgi?id=118829)

It would be useful to know if a 2.6.25/2.6.26 kernel without other
patches does the same thing.
> 
> compiled with libata.
> 
> Motherboard ICH6 family (id 2651)
> 
> Does anyone know this problem or have an idea about what it is and what to do?

Insufficient information to guess right now.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: SCSI or libata problem with an RDX removable disk
  2008-09-04 11:34 ` Alan Cox
@ 2008-09-04 13:52   ` Pascal GREGIS
  2008-09-08 10:21     ` Alan Cox
  2008-09-08  8:19   ` Pascal GREGIS
  1 sibling, 1 reply; 7+ messages in thread
From: Pascal GREGIS @ 2008-09-04 13:52 UTC (permalink / raw)
  To: linux-kernel

Alan Cox a écrit, le jeu 04 sep 2008 à 12:34:18 :
> > Looking to /var/log/messages I have the following logs :
> 
> I need the logs where it fails not afterwards - the ones showing the
> drive error and where the fs gives up. Without that it is hard to see
> what happened.
Sorry, you're right, I didn't see these logs :
Sep  4 08:03:01 devsni1 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  4 08:03:01 devsni1 kernel: ata4.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x2a data 131072 out
Sep  4 08:03:01 devsni1 kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  4 08:03:08 devsni1 kernel: ata4: port is slow to respond, please be patient (Status 0xd0)
Sep  4 08:03:31 devsni1 kernel: ata4: port failed to respond (30 secs, Status 0xd0)
Sep  4 08:03:31 devsni1 kernel: ata4: soft resetting port
Sep  4 08:03:32 devsni1 kernel: ATA: abnormal status 0xD0 on port 0x0001d807
Sep  4 08:03:32 devsni1 last message repeated 4 times
Sep  4 08:06:14 devsni1 kernel: 
Sep  4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000
Sep  4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700080
Sep  4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000
Sep  4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700336
Sep  4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000
Sep  4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700592
Sep  4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000
Sep  4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700848
... and so on with always different sector numbers.

What is strange is that before this, I have logs looking like :

Sep  4 07:27:32 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:27:32 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:27:32 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:27:32 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:27:32 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:27:32 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:28:20 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:28:20 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:28:20 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:28:20 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:28:20 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:28:20 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:28:52 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:28:52 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:28:52 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:28:52 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:28:52 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:28:52 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:36:19 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:36:19 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:36:19 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:36:20 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:36:20 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:36:20 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:37:01 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:37:01 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:37:01 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:37:01 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:37:01 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:37:01 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:38:12 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:38:12 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:38:12 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep  4 07:38:12 devsni1 kernel: kjournald starting.  Commit interval 5 seconds
Sep  4 07:38:12 devsni1 kernel: EXT3 FS on sdb, internal journal
Sep  4 07:38:12 devsni1 kernel: EXT3-fs: mounted filesystem with ordered data mode.

and all my /var/log/messages is polluted with such logs way before 07:28 until the error happens.
Maybe this is due to commands automatically executed by our softwares running on this machine, do you know if it is normal?

> 
> > My system is :
> > linux kernel 2.6.21.1 with some patches :
> > - libata-start_stop_management (http://bugs.gentoo.org/attachment.cgi?id=118829)
> 
> It would be useful to know if a 2.6.25/2.6.26 kernel without other
> patches does the same thing.
Yes, but this is not so easy to do, we haven't currently a clear status on the frequence of reproduction of the bug.
I'll see what I can do.

> > 
> > compiled with libata.
> > 
> > Motherboard ICH6 family (id 2651)
> > ...

Regards

Pascal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: SCSI or libata problem with an RDX removable disk
  2008-09-04 11:34 ` Alan Cox
  2008-09-04 13:52   ` Pascal GREGIS
@ 2008-09-08  8:19   ` Pascal GREGIS
  1 sibling, 0 replies; 7+ messages in thread
From: Pascal GREGIS @ 2008-09-08  8:19 UTC (permalink / raw)
  To: linux-kernel

Hi everyone,

I posted this problem last week on this mailing list, I got an answer from Alan Cox requiring more informations.
Then when I gave those informations, I didn't get any other answer.
So I try another time to get help from some of you.

Here is my problem :
I have a Linux box with an RDX removable disk in SATA. A software uses regularly this RDX, mounts it, reads and/or writes to it and unmounts it.
But after a certain time or a certain number of uses (not clearly identified), the device fails to respond, mount displaying something like :
"There is no filesystem on this device"

In /var/log/messages I have :
Sep  4 08:03:01 devsni1 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  4 08:03:01 devsni1 kernel: ata4.00: cmd a0/01:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x2a data 131072 out
Sep  4 08:03:01 devsni1 kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep  4 08:03:08 devsni1 kernel: ata4: port is slow to respond, please be patient (Status 0xd0)
Sep  4 08:03:31 devsni1 kernel: ata4: port failed to respond (30 secs, Status 0xd0)
Sep  4 08:03:31 devsni1 kernel: ata4: soft resetting port
Sep  4 08:03:32 devsni1 kernel: ATA: abnormal status 0xD0 on port 0x0001d807
Sep  4 08:03:32 devsni1 last message repeated 4 times
Sep  4 08:06:14 devsni1 kernel: 
Sep  4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000
Sep  4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700080
Sep  4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000
Sep  4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700336
Sep  4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000
Sep  4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700592
Sep  4 08:06:14 devsni1 kernel: sd 3:0:0:0: SCSI error: return code = 0x00040000
Sep  4 08:06:14 devsni1 kernel: end_request: I/O error, dev sdb, sector 37700848
... and so on with always different sector numbers.

And then everytime I issue a mount, a parted, a dd or anything, I get the following logs :

Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] READ CAPACITY failed
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Sense not available.
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Write Protect is off
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Mode Sense: 00 00 00 00
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Asking for cache data failed
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Assuming drive cache: write through
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] READ CAPACITY failed
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Sense not available.
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Write Protect is off
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Mode Sense: 00 00 00 00
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Asking for cache data failed
Sep  4 08:55:54 testrdx kernel: sd 0:0:1:0: [sdb] Assuming drive cache: write through

Does anyone know to what are referring the rrors seen in the logs, or if there is a known bug on this point or anything that could help me? 

My system is :
linux kernel 2.6.21.1 with some patches :
- libata-start_stop_management (http://bugs.gentoo.org/attachment.cgi?id=118829)

compiled with libata.
Motherboard ICH6 family (id 2651)
...

Alan Cox suggested me to test with a 2.6.25/2.6.26 kernel without other
patches, but this is not so easy to do, I haven't currently a clear status on the frequence of reproduction of the bug.
I'll see what I can do.

Regards

Pascal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: SCSI or libata problem with an RDX removable disk
  2008-09-04 13:52   ` Pascal GREGIS
@ 2008-09-08 10:21     ` Alan Cox
  2008-09-08 18:58       ` Mark Lord
  0 siblings, 1 reply; 7+ messages in thread
From: Alan Cox @ 2008-09-08 10:21 UTC (permalink / raw)
  To: Pascal GREGIS; +Cc: linux-kernel, Mark Lord

> Sep  4 08:03:08 devsni1 kernel: ata4: port is slow to respond, please be patient (Status 0xd0)
> Sep  4 08:03:31 devsni1 kernel: ata4: port failed to respond (30 secs, Status 0xd0)
> Sep  4 08:03:31 devsni1 kernel: ata4: soft resetting port
> Sep  4 08:03:32 devsni1 kernel: ATA: abnormal status 0xD0 on port 0x0001d807
> Sep  4 08:03:32 devsni1 last message repeated 4 times

Your disk went offline and then refused to come back when the link was
reset. The initial trigger appears to have been the drive, the fact it
didn't come back could either be the drive or a controller problem. We've
seen a few cases where devices or controllers fail to recover from one
end being stuck expecting data.

Mark Lord did some patches to try and drain data in this case but I don't
remember if they were merged yet.

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: SCSI or libata problem with an RDX removable disk
  2008-09-08 10:21     ` Alan Cox
@ 2008-09-08 18:58       ` Mark Lord
  2008-09-10  8:42         ` Pascal GREGIS
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Lord @ 2008-09-08 18:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Pascal GREGIS, linux-kernel

Alan Cox wrote:
>> Sep  4 08:03:08 devsni1 kernel: ata4: port is slow to respond, please be patient (Status 0xd0)
>> Sep  4 08:03:31 devsni1 kernel: ata4: port failed to respond (30 secs, Status 0xd0)
>> Sep  4 08:03:31 devsni1 kernel: ata4: soft resetting port
>> Sep  4 08:03:32 devsni1 kernel: ATA: abnormal status 0xD0 on port 0x0001d807
>> Sep  4 08:03:32 devsni1 last message repeated 4 times
> 
> Your disk went offline and then refused to come back when the link was
> reset. The initial trigger appears to have been the drive, the fact it
> didn't come back could either be the drive or a controller problem. We've
> seen a few cases where devices or controllers fail to recover from one
> end being stuck expecting data.
> 
> Mark Lord did some patches to try and drain data in this case but I don't
> remember if they were merged yet.
..

That would be this patch, currently not merged, not maintained,
and probably needs rework for some chipsets.  But for the record:


Tejun Heo wrote:
> Jeff Garzik wrote:
>> Tejun Heo wrote:
>>> Alan Cox wrote:
>>>>> I think there have been enough cases where this draining was necessary.
>>>>>  IIRC, ata_piix was involved in those cases, right?  If so, can you
>>>>> please submit a patch which applies this only to affected controllers?
>>>>> I don't feel too confident about applying this to all SFF controllers.
>>>> Old IDE does it on all controllers bar a couple. So we have a very good
>>>> knowledge of what does/doesn't work. The one that needs care in old ide
>>>> is an ordering issue where a state machine reset done first causes the
>>>> drain of the I/O to hang.
>>> Hmmm... So, do we apply draining to all PATA?  Or is ata_piix SATA
>>> affected too?
>> I would think all SFF controllers, since a lot of first gen SATA are
>> really bridged solutions.  If they are flagging DRQ, I say oblige them :)
>
> Alright, then the posted patch should be good enough.  Mark, can you be
> bothered to regenerate the patch and post it one more time (again)?  It
> seems we all agree the update is needed.

I think this original patch still applies cleanly on at least 2.6.23-rc7.

Drain up to 512 words from host/bridge FIFO on stuck DRQ HSM violation,
rather than just getting stuck there forever.

Signed-off-by: Mark Lord <mlord@pobox.com>
---

--- old/drivers/ata/libata-sff.c	2007-09-28 09:29:22.000000000 -0400
+++ linux/drivers/ata/libata-sff.c	2007-09-28 09:39:44.000000000 -0400
@@ -420,6 +420,28 @@
 	ap->ops->irq_on(ap);
 }
 
+static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+	u8 stat = ata_chk_status(ap);
+	/*
+	 * Try to clear stuck DRQ if necessary,
+	 * by reading/discarding up to two sectors worth of data.
+	 */
+	if ((stat & ATA_DRQ) && (!qc || qc->dma_dir != DMA_TO_DEVICE)) {
+		unsigned int i;
+		unsigned int limit = qc ? qc->sect_size : ATA_SECT_SIZE;
+
+		printk(KERN_WARNING "Draining up to %u words from data FIFO.\n",
+									limit);
+		for (i = 0; i < limit ; ++i) {
+			ioread16(ap->ioaddr.data_addr);
+			if (!(ata_chk_status(ap) & ATA_DRQ))
+				break;
+		}
+		printk(KERN_WARNING "Drained %u/%u words.\n", i, limit);
+	}
+}
+
 /**
  *	ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
  *	@ap: port to handle error for
@@ -476,7 +498,7 @@
 	}
 
 	ata_altstatus(ap);
-	ata_chk_status(ap);
+	ata_drain_fifo(ap, qc);
 	ap->ops->irq_clear(ap);
 
 	spin_unlock_irqrestore(ap->lock, flags);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: SCSI or libata problem with an RDX removable disk
  2008-09-08 18:58       ` Mark Lord
@ 2008-09-10  8:42         ` Pascal GREGIS
  0 siblings, 0 replies; 7+ messages in thread
From: Pascal GREGIS @ 2008-09-10  8:42 UTC (permalink / raw)
  To: linux-kernel

Hi Mark and Alan,

Thank you for your answers and for the patch Mark sent in his mail.
I had found just before this patch :
http://kerneltrap.org/mailarchive/linux-kernel/2007/9/27/324334
which seems to be different from the one you sent.

What does cuase this difference, do these two patches apply on different kernel versions or do they correspond to different revisions of the patch, one being fixed compared to the other?

Thank you

Pascal


Mark Lord a écrit, le lun 08 sep 2008 à 02:58:17 :
> Alan Cox wrote:
> >>Sep  4 08:03:08 devsni1 kernel: ata4: port is slow to respond, please be 
> >>patient (Status 0xd0)
> >>Sep  4 08:03:31 devsni1 kernel: ata4: port failed to respond (30 secs, 
> >>Status 0xd0)
> >>Sep  4 08:03:31 devsni1 kernel: ata4: soft resetting port
> >>Sep  4 08:03:32 devsni1 kernel: ATA: abnormal status 0xD0 on port 
> >>0x0001d807
> >>Sep  4 08:03:32 devsni1 last message repeated 4 times
> >
> >Your disk went offline and then refused to come back when the link was
> >reset. The initial trigger appears to have been the drive, the fact it
> >didn't come back could either be the drive or a controller problem. We've
> >seen a few cases where devices or controllers fail to recover from one
> >end being stuck expecting data.
> >
> >Mark Lord did some patches to try and drain data in this case but I don't
> >remember if they were merged yet.
> ..
> 
> That would be this patch, currently not merged, not maintained,
> and probably needs rework for some chipsets.  But for the record:
> 
> 
> Tejun Heo wrote:
> >Jeff Garzik wrote:
> >>Tejun Heo wrote:
> >>>Alan Cox wrote:
> >>>>>I think there have been enough cases where this draining was necessary.
> >>>>> IIRC, ata_piix was involved in those cases, right?  If so, can you
> >>>>>please submit a patch which applies this only to affected controllers?
> >>>>>I don't feel too confident about applying this to all SFF controllers.
> >>>>Old IDE does it on all controllers bar a couple. So we have a very good
> >>>>knowledge of what does/doesn't work. The one that needs care in old ide
> >>>>is an ordering issue where a state machine reset done first causes the
> >>>>drain of the I/O to hang.
> >>>Hmmm... So, do we apply draining to all PATA?  Or is ata_piix SATA
> >>>affected too?
> >>I would think all SFF controllers, since a lot of first gen SATA are
> >>really bridged solutions.  If they are flagging DRQ, I say oblige them :)
> >
> >Alright, then the posted patch should be good enough.  Mark, can you be
> >bothered to regenerate the patch and post it one more time (again)?  It
> >seems we all agree the update is needed.
> 
> I think this original patch still applies cleanly on at least 2.6.23-rc7.
> 
> Drain up to 512 words from host/bridge FIFO on stuck DRQ HSM violation,
> rather than just getting stuck there forever.
> 
> Signed-off-by: Mark Lord <mlord@pobox.com>
> ---
> 
> --- old/drivers/ata/libata-sff.c	2007-09-28 09:29:22.000000000 -0400
> +++ linux/drivers/ata/libata-sff.c	2007-09-28 09:39:44.000000000 -0400
> @@ -420,6 +420,28 @@
> 	ap->ops->irq_on(ap);
> }
> 
> +static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
> +{
> +	u8 stat = ata_chk_status(ap);
> +	/*
> +	 * Try to clear stuck DRQ if necessary,
> +	 * by reading/discarding up to two sectors worth of data.
> +	 */
> +	if ((stat & ATA_DRQ) && (!qc || qc->dma_dir != DMA_TO_DEVICE)) {
> +		unsigned int i;
> +		unsigned int limit = qc ? qc->sect_size : ATA_SECT_SIZE;
> +
> +		printk(KERN_WARNING "Draining up to %u words from data 
> FIFO.\n",
> +								 limit);
> +		for (i = 0; i < limit ; ++i) {
> +			ioread16(ap->ioaddr.data_addr);
> +			if (!(ata_chk_status(ap) & ATA_DRQ))
> +				break;
> +		}
> +		printk(KERN_WARNING "Drained %u/%u words.\n", i, limit);
> +	}
> +}
> +
> /**
>  *	ata_bmdma_drive_eh - Perform EH with given methods for BMDMA 
>  controller
>  *	@ap: port to handle error for
> @@ -476,7 +498,7 @@
> 	}
> 
> 	ata_altstatus(ap);
> -	ata_chk_status(ap);
> +	ata_drain_fifo(ap, qc);
> 	ap->ops->irq_clear(ap);
> 
> 	spin_unlock_irqrestore(ap->lock, flags);

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-09-10  8:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-04  9:54 SCSI or libata problem with an RDX removable disk Pascal GREGIS
2008-09-04 11:34 ` Alan Cox
2008-09-04 13:52   ` Pascal GREGIS
2008-09-08 10:21     ` Alan Cox
2008-09-08 18:58       ` Mark Lord
2008-09-10  8:42         ` Pascal GREGIS
2008-09-08  8:19   ` Pascal GREGIS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).