linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [rfc/patch] libata -- port configurable delays
@ 2005-05-13 18:58 Benjamin LaHaise
  2005-05-13 19:13 ` Jeff Garzik
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Benjamin LaHaise @ 2005-05-13 18:58 UTC (permalink / raw)
  To: jgarzik; +Cc: linux-kernel

Hello Jeff et al,

The patch below makes the delays in ata_pause() and ata_busy_wait() 
configurable on a per-port basis, and enables the no delay flag on 
the one chipset I've tested on.  Getting rid of the delays is worth 
quite a bit: doing sequential 512 byte O_DIRECT AIO reads results in 
a drop from 35.743s to 29.205s using simple-aio-min_nr 20480 10 (a copy 
is available at http://www.kvack.org/~bcrl/simple-aio-min_nr.c).  
Before this patch __delay() is the number one entry in oprofile 
results for this workload.  Does this look like a reasonable approach 
for chipsets that aren't completely braindead?  Cheers,

		-ben
-- 
"Time is what keeps everything from happening all at once." -- John Wheeler

CPU: P4 / Xeon with 2 hyper-threads, speed 2994.52 MHz (estimated)
Counted INSTR_RETIRED events (retired instructions) with a unit mask of 0x03 (mu
ltiple flags) count 90000
samples  %        app name                 symbol name
6676     19.5011  vmlinux                  __delay
923       2.6962  vmlinux                  __blockdev_direct_IO
733       2.1411  vmlinux                  memset
711       2.0769  vmlinux                  kmem_cache_alloc
698       2.0389  vmlinux                  kmem_cache_free
639       1.8666  vmlinux                  mempool_alloc
455       1.3291  vmlinux                  as_insert_request
440       1.2853  vmlinux                  scsi_request_fn
404       1.1801  vmlinux                  ext3_get_branch
404       1.1801  vmlinux                  mempool_free
374       1.0925  vmlinux                  scsi_dispatch_cmd
356       1.0399  vmlinux                  __mod_timer
352       1.0282  vmlinux                  __make_request
...

Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
diff -purN v2.6.12-rc4/drivers/scsi/ata_piix.c libata-rc4/drivers/scsi/ata_piix.c
--- v2.6.12-rc4/drivers/scsi/ata_piix.c	2005-04-28 11:01:54.000000000 -0400
+++ libata-rc4/drivers/scsi/ata_piix.c	2005-05-13 13:30:39.000000000 -0400
@@ -228,6 +228,7 @@ static struct ata_port_info piix_port_in
 		.sht		= &piix_sht,
 		.host_flags	= ATA_FLAG_SATA | ATA_FLAG_SRST |
 				  PIIX_FLAG_COMBINED | PIIX_FLAG_CHECKINTR |
+				  ATA_FLAG_NO_UDELAY |
 				  ATA_FLAG_SLAVE_POSS,
 		.pio_mask	= 0x1f,	/* pio0-4 */
 		.mwdma_mask	= 0x07, /* mwdma0-2 */
diff -purN v2.6.12-rc4/include/linux/libata.h libata-rc4/include/linux/libata.h
--- v2.6.12-rc4/include/linux/libata.h	2005-04-06 17:28:10.000000000 -0400
+++ libata-rc4/include/linux/libata.h	2005-05-13 13:32:15.000000000 -0400
@@ -113,6 +113,7 @@ enum {
 	ATA_FLAG_MMIO		= (1 << 6), /* use MMIO, not PIO */
 	ATA_FLAG_SATA_RESET	= (1 << 7), /* use COMRESET */
 	ATA_FLAG_PIO_DMA	= (1 << 8), /* PIO cmds via DMA */
+	ATA_FLAG_NO_UDELAY	= (1 << 9), /* don't udelay on port access */
 
 	ATA_QCFLAG_ACTIVE	= (1 << 1), /* cmd not yet ack'd to scsi lyer */
 	ATA_QCFLAG_SG		= (1 << 3), /* have s/g table? */
@@ -469,7 +470,8 @@ static inline u8 ata_chk_status(struct a
 static inline void ata_pause(struct ata_port *ap)
 {
 	ata_altstatus(ap);
-	ndelay(400);
+	if (!(ap->flags & ATA_FLAG_NO_UDELAY))
+		ndelay(400);
 }
 
 static inline u8 ata_busy_wait(struct ata_port *ap, unsigned int bits,
@@ -478,7 +480,8 @@ static inline u8 ata_busy_wait(struct at
 	u8 status;
 
 	do {
-		udelay(10);
+		if (!(ap->flags & ATA_FLAG_NO_UDELAY))
+			udelay(10);
 		status = ata_chk_status(ap);
 		max--;
 	} while ((status & bits) && (max > 0));

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 18:58 [rfc/patch] libata -- port configurable delays Benjamin LaHaise
@ 2005-05-13 19:13 ` Jeff Garzik
  2005-05-13 20:03   ` Benjamin LaHaise
  2005-05-14  1:51   ` Mark Lord
  2005-05-13 19:17 ` Jeff Garzik
  2005-05-13 21:20 ` Alan Cox
  2 siblings, 2 replies; 11+ messages in thread
From: Jeff Garzik @ 2005-05-13 19:13 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel, linux-ide, Alan Cox

Benjamin LaHaise wrote:
> Hello Jeff et al,
> 
> The patch below makes the delays in ata_pause() and ata_busy_wait() 
> configurable on a per-port basis, and enables the no delay flag on 
> the one chipset I've tested on.  Getting rid of the delays is worth 
> quite a bit: doing sequential 512 byte O_DIRECT AIO reads results in 
> a drop from 35.743s to 29.205s using simple-aio-min_nr 20480 10 (a copy 
> is available at http://www.kvack.org/~bcrl/simple-aio-min_nr.c).  
> Before this patch __delay() is the number one entry in oprofile 
> results for this workload.  Does this look like a reasonable approach 
> for chipsets that aren't completely braindead?  Cheers,

Well, there are several things going on here.

> @@ -469,7 +470,8 @@ static inline u8 ata_chk_status(struct a
>  static inline void ata_pause(struct ata_port *ap)
>  {
>  	ata_altstatus(ap);
> -	ndelay(400);
> +	if (!(ap->flags & ATA_FLAG_NO_UDELAY))
> +		ndelay(400);

This delay is required per spec.  So, this specific change is vetoed.


> @@ -478,7 +480,8 @@ static inline u8 ata_busy_wait(struct at
>  	u8 status;
>  
>  	do {
> -		udelay(10);
> +		if (!(ap->flags & ATA_FLAG_NO_UDELAY))
> +			udelay(10);
>  		status = ata_chk_status(ap);
>  		max--;
>  	} while ((status & bits) && (max > 0));

This delay is based on field experience, rather than spec.  I'm open to 
making this optional, as you have done.  Some issues related to this 
delay, to consider:

1) Nothing in life is free.  This delay is useful on some platforms, 
because banging away at the Status register for extended periods of time 
can cause an insane amount of PCI IO traffic.  Removing the delay just 
moves the punishment from one area to another.

2) In a few controllers, the SATA<->FIS emulation can go kerflooey if 
you bang the Status register 'too hard'.

3) IIRC some rare PATA devices don't like having their Status register 
banged "too hard".  No data, just a vague memory.

4) It may be worthwhile to rewrite the loop to check the Status register 
_first_, then delay.

Finally, simply disabling the delay is IMO _far_ too dangerous on such a 
popular driver (ata_piix).

I would be conservative, and create a module option for libata (not 
ata_piix) which allows a user to globally disable the delay.  And make 
sure that option defaults to 'delay', the current behavior.

Creative suggestions welcomed...

	Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 18:58 [rfc/patch] libata -- port configurable delays Benjamin LaHaise
  2005-05-13 19:13 ` Jeff Garzik
@ 2005-05-13 19:17 ` Jeff Garzik
  2005-05-13 21:20 ` Alan Cox
  2 siblings, 0 replies; 11+ messages in thread
From: Jeff Garzik @ 2005-05-13 19:17 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel

Of course, the real solution is to get a modern controller like AHCI or 
SiI 3124, which don't require banging on the Status register -- an 
ultimately pointless, completely emulated operation in SATA.

:)

	Jeff




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 19:13 ` Jeff Garzik
@ 2005-05-13 20:03   ` Benjamin LaHaise
  2005-05-13 21:52     ` Alan Cox
  2005-05-14  1:51   ` Mark Lord
  1 sibling, 1 reply; 11+ messages in thread
From: Benjamin LaHaise @ 2005-05-13 20:03 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, linux-ide, Alan Cox

On Fri, May 13, 2005 at 03:13:50PM -0400, Jeff Garzik wrote:
> This delay is based on field experience, rather than spec.  I'm open to 
> making this optional, as you have done.  Some issues related to this 
> delay, to consider:

Do you have the original bug reports so we know which chipsets / drive 
combinations are problematic?

> 1) Nothing in life is free.  This delay is useful on some platforms, 
> because banging away at the Status register for extended periods of time 
> can cause an insane amount of PCI IO traffic.  Removing the delay just 
> moves the punishment from one area to another.

In this case the time to complete the task is reduced by 18%, so the 
delay is definately excessive.  That 18% is remarkably close to the 
19.5% hit for __delay in the profile results.

> 2) In a few controllers, the SATA<->FIS emulation can go kerflooey if 
> you bang the Status register 'too hard'.

The patch only enables it on one variant of ICH6 SATA so far.

> 3) IIRC some rare PATA devices don't like having their Status register 
> banged "too hard".  No data, just a vague memory.
> 
> 4) It may be worthwhile to rewrite the loop to check the Status register 
> _first_, then delay.
> 
> Finally, simply disabling the delay is IMO _far_ too dangerous on such a 
> popular driver (ata_piix).

Have you a better suggestion for blacklisting devices?

> I would be conservative, and create a module option for libata (not 
> ata_piix) which allows a user to globally disable the delay.  And make 
> sure that option defaults to 'delay', the current behavior.

This option is undesirable for hardware which behaves correctly.  The 
vast majority of users aren't going to be fiddling with hdparm options
(I certainly don't want to see modern SATA being a repeat of the magic 
handwaving the IDE driver and DMA went through for years).

At the very least the option should be introduced and new drivers should 
start life without these legacy delays in place.  If doing it on a per 
port interface isn't good enough, what is?  Globally seems worse to me 
as then any system with a mix of old and new hardware isn't going to do 
the right thing.

		-ben
-- 
"Time is what keeps everything from happening all at once." -- John Wheeler

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 18:58 [rfc/patch] libata -- port configurable delays Benjamin LaHaise
  2005-05-13 19:13 ` Jeff Garzik
  2005-05-13 19:17 ` Jeff Garzik
@ 2005-05-13 21:20 ` Alan Cox
  2005-05-13 23:21   ` Jeff Garzik
  2 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2005-05-13 21:20 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: jgarzik, Linux Kernel Mailing List

On Gwe, 2005-05-13 at 19:58, Benjamin LaHaise wrote:
> is available at http://www.kvack.org/~bcrl/simple-aio-min_nr.c).  
> Before this patch __delay() is the number one entry in oprofile 
> results for this workload.  Does this look like a reasonable approach 
> for chipsets that aren't completely braindead?  Cheers,

If your chipset implements the 400nS lockout in hardware it certainly
seems to make sense. Nice to know someone has put it in hardware


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 20:03   ` Benjamin LaHaise
@ 2005-05-13 21:52     ` Alan Cox
  2005-05-13 23:07       ` Jeff Garzik
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2005-05-13 21:52 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Jeff Garzik, Linux Kernel Mailing List, linux-ide

On Gwe, 2005-05-13 at 21:03, Benjamin LaHaise wrote:
> > 3) IIRC some rare PATA devices don't like having their Status register 
> > banged "too hard".  No data, just a vague memory.

Not that I am aware of. There are a few ICH/PIIX variants where if you
read status during a transaction at the wrong time bad stuff occurs
including to the block on disk. That may be what you are thinking of

> > 
> > 4) It may be worthwhile to rewrite the loop to check the Status register 
> > _first_, then delay.

The 400nS delay after a command is required before status becomes valid.
This isn't about 'incorrect' devices in the command case. It is about
strictly correct behaviour and propogation/response times. For the cases
its not required and you wan to keep PCI load down then checking first
is clearly logical.

Also btw beware of PCI posting - writel/ndelay(400) isn't going to do
the right thing.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 21:52     ` Alan Cox
@ 2005-05-13 23:07       ` Jeff Garzik
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Garzik @ 2005-05-13 23:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Benjamin LaHaise, Linux Kernel Mailing List, linux-ide

Alan Cox wrote:
>>>4) It may be worthwhile to rewrite the loop to check the Status register 
>>>_first_, then delay.
> 
> 
> The 400nS delay after a command is required before status becomes valid.
> This isn't about 'incorrect' devices in the command case. It is about
> strictly correct behaviour and propogation/response times. For the cases
> its not required and you wan to keep PCI load down then checking first
> is clearly logical.

The 400nS delay isn't the one in the loop.  I was referring to the other 
delay.

Putting the Status register read first will also flush out any posted 
writes, before delaying, and potentially exit more rapidly if the device 
is immediately ready.

	Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 21:20 ` Alan Cox
@ 2005-05-13 23:21   ` Jeff Garzik
  2005-05-14 21:11     ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff Garzik @ 2005-05-13 23:21 UTC (permalink / raw)
  To: Alan Cox; +Cc: Benjamin LaHaise, Linux Kernel Mailing List

Alan Cox wrote:
> On Gwe, 2005-05-13 at 19:58, Benjamin LaHaise wrote:
> 
>>is available at http://www.kvack.org/~bcrl/simple-aio-min_nr.c).  
>>Before this patch __delay() is the number one entry in oprofile 
>>results for this workload.  Does this look like a reasonable approach 
>>for chipsets that aren't completely braindead?  Cheers,
> 
> 
> If your chipset implements the 400nS lockout in hardware it certainly
> seems to make sense. Nice to know someone has put it in hardware

No, it's just mostly irrelevant under SATA.

Under SATA you are -not- talking to a device when you touch 
Status/AltStatus, you are talking to the host controller.  Specifically, 
you're talking to a controller buffer that stores a copy of the ATA 
shadow registers.

The ATA registers are transmitted to the device in a single packet, 
called a FIS, when the Command or Device Control register is written.

When the device updates its status, or completes a command, it sends a 
FIS from device to controller, instructing the controller to update its 
cached copy of the Status register.

You're bitbanging a buffer, in SATA.

	Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 19:13 ` Jeff Garzik
  2005-05-13 20:03   ` Benjamin LaHaise
@ 2005-05-14  1:51   ` Mark Lord
  1 sibling, 0 replies; 11+ messages in thread
From: Mark Lord @ 2005-05-14  1:51 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Benjamin LaHaise, linux-kernel, linux-ide, Alan Cox

PIIX chips should be fine w.r.t. unnecessary delays (or lack thereof).

I recall only one situation WAY BACK in time that had an issue with
continuous banging, but don't remember anything of the details
other than that this was circa 1995 or so.

Cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-13 23:21   ` Jeff Garzik
@ 2005-05-14 21:11     ` Alan Cox
  2005-05-14 21:47       ` Jeff Garzik
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2005-05-14 21:11 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Benjamin LaHaise, Linux Kernel Mailing List

On Sad, 2005-05-14 at 00:21, Jeff Garzik wrote:
> > If your chipset implements the 400nS lockout in hardware it certainly
> > seems to make sense. Nice to know someone has put it in hardware
> 
> No, it's just mostly irrelevant under SATA.

libata is for PATA devices too.

> The ATA registers are transmitted to the device in a single packet, 
> called a FIS, when the Command or Device Control register is written.
> 
> When the device updates its status, or completes a command, it sends a 
> FIS from device to controller, instructing the controller to update its 
> cached copy of the Status register.

The controller in ATA style behaviour isn't required to have the status
register valid at the point the FIS is written. It probably does but it
isn't required.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [rfc/patch] libata -- port configurable delays
  2005-05-14 21:11     ` Alan Cox
@ 2005-05-14 21:47       ` Jeff Garzik
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Garzik @ 2005-05-14 21:47 UTC (permalink / raw)
  To: Alan Cox; +Cc: Benjamin LaHaise, Linux Kernel Mailing List

Alan Cox wrote:
> On Sad, 2005-05-14 at 00:21, Jeff Garzik wrote:
> 
>>>If your chipset implements the 400nS lockout in hardware it certainly
>>>seems to make sense. Nice to know someone has put it in hardware
>>
>>No, it's just mostly irrelevant under SATA.
> 
> 
> libata is for PATA devices too.

Yes, but Ben has SATA, which is the hardware in question.

The 400nS delay which interests you so much in this thread isn't the 
interesting delay.  The 400nS delay isn't going away.  The 
udelay(10)-in-a-loop is what slows things down, and shows up on Ben's 
profiles.

The best solution is to convert PIO from polling to interrupt driven, so 
that we're not beating up the bus (big reason for the udelay).


>>The ATA registers are transmitted to the device in a single packet, 
>>called a FIS, when the Command or Device Control register is written.
>>
>>When the device updates its status, or completes a command, it sends a 
>>FIS from device to controller, instructing the controller to update its 
>>cached copy of the Status register.
> 
> 
> The controller in ATA style behaviour isn't required to have the status
> register valid at the point the FIS is written. It probably does but it
> isn't required.

After a tiny period of time, it's required to have set the BSY bit.

	Jeff



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-05-14 21:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-13 18:58 [rfc/patch] libata -- port configurable delays Benjamin LaHaise
2005-05-13 19:13 ` Jeff Garzik
2005-05-13 20:03   ` Benjamin LaHaise
2005-05-13 21:52     ` Alan Cox
2005-05-13 23:07       ` Jeff Garzik
2005-05-14  1:51   ` Mark Lord
2005-05-13 19:17 ` Jeff Garzik
2005-05-13 21:20 ` Alan Cox
2005-05-13 23:21   ` Jeff Garzik
2005-05-14 21:11     ` Alan Cox
2005-05-14 21:47       ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).