All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale
@ 2007-02-21 22:56 Fajun Chen
  2007-02-22  0:04 ` Alan
  0 siblings, 1 reply; 7+ messages in thread
From: Fajun Chen @ 2007-02-21 22:56 UTC (permalink / raw)
  To: linux-ide; +Cc: Tejun Heo, alan

Hi Folks,

I've noticed the following code in both pata_sil680.c and IDE code siimage.c
        /* FIXME: double check */
	pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, (class_rev) ? 1 : 255);
I was unable to find the recommended setting in Sil680 document. Could
someone explain the rational behind the code above? Does it need to be
adjusted on different processors for PCI read/write performance?

The problem I am investigating is slow IO on PATA Sil680 on ARM XScale
processor (VIVT cache) but not on i386. Based on libata trace below,
it took about 4ms for read DMA command to finish:
[4294934.196000] ata_scsi_dump_cdb: CDB (1:0,0,0) 28 00 00 0e fa 00 00 00 80
[4294934.196000] ata_scsi_translate: ENTER
[4294934.196000] scsi_10_lba_len: ten-byte command
[4294934.196000] ata_sg_setup: ENTER, ata1
[4294934.196000] ata_sg_setup: 13 sg elements mapped
[4294934.196000] ata_fill_sg: PRD[0] = (0x2C3F000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[1] = (0x2D76000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[2] = (0x2C5B000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[3] = (0x2C98000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[4] = (0x2D5E000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[5] = (0x2D71000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[6] = (0x2D7C000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[7] = (0x2D8B000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[8] = (0x2DA1000, 0x1000)
[4294934.196000] ata_fill_sg: PRD[9] = (0x2D0C000, 0x2000)
[4294934.196000] ata_fill_sg: PRD[10] = (0x33FC000, 0x2000)
[4294934.196000] ata_fill_sg: PRD[11] = (0x2D8C000, 0x2000)
[4294934.196000] ata_fill_sg: PRD[12] = (0x2C06000, 0x1000)
[4294934.196000] ata1: ata_dev_select: ENTER, ata1: device 0, wait 1
[4294934.196000] ata_tf_load_pio: feat 0x0 nsect 0x80 lba 0x0 0xFA 0xE
[4294934.196000] ata_tf_load_pio: device 0xE0
[4294934.196000] ata_exec_command_pio: ata1: cmd 0xC8
[4294934.196000] ata_scsi_translate: EXIT
[4294934.200000] ata_host_intr: ata1: protocol 3 task_state 3
[4294934.200000] ata_host_intr: ata1: host_stat 0x4
[4294934.200000] ata_hsm_move: ata1: protocol 3 task_state 3 (dev_stat 0x50)
[4294934.200000] ata_hsm_move: ata1: dev 0 command complete, drv_stat 0x50
[4294934.200000] ata_sg_clean: unmapping 13 sg elements

I did the same test on i386 with the same PATA Sil680 HBA and the
interrupt latency is reduced to around 1ms:
[  113.494605] ata_scsi_dump_cdb: CDB (5:0,0,0) 28 00 00 0a ad 80 00 00 80
[  113.494674] ata_scsi_translate: ENTER
[  113.494731] scsi_10_lba_len: ten-byte command
[  113.494791] ata_sg_setup: ENTER, ata5
[  113.494847] ata_sg_setup: 2 sg elements mapped
[  113.494907] ata_fill_sg: PRD[0] = (0x1158000, 0x4000)
[  113.494968] ata_fill_sg: PRD[1] = (0x1170000, 0xC000)
[  113.495029] ata5: ata_dev_select: ENTER, ata5: device 0, wait 1
[  113.495125] ata_tf_load_pio: feat 0x0 nsect 0x80 lba 0x80 0xAD 0xA
[  113.495190] ata_tf_load_pio: device 0xE0
[  113.495261] ata_exec_command_pio: ata5: cmd 0xC8
[  113.495324] ata_scsi_translate: EXIT
[  113.496005] ata_host_intr: ata5: protocol 3 task_state 3
[  113.496068] ata_host_intr: ata5: host_stat 0x4
[  113.496135] ata_hsm_move: ata5: protocol 3 task_state 3 (dev_stat 0x50)
[  113.496201] ata_hsm_move: ata5: dev 0 command complete, drv_stat 0x50
[  113.496266] ata_sg_clean: unmapping 2 sg elements

I also observed that the same AT command (Read DMA) took around 1ms on
the same test hardware with SATA Sil3124 HBA.

As part of the experiments, I've changed Sil680 cache line size to
0x08, 0x04, 0x02, etc, but the IO performance was not improved.  So
what might be the bottleneck causing the IO slowness on ARM XScale?
Thanks in advance for your help!

Thanks,
Fajun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale
  2007-02-21 22:56 Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale Fajun Chen
@ 2007-02-22  0:04 ` Alan
  2007-02-22  1:21   ` Fajun Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Alan @ 2007-02-22  0:04 UTC (permalink / raw)
  To: Fajun Chen; +Cc: linux-ide, Tejun Heo

On Wed, 21 Feb 2007 15:56:28 -0700
"Fajun Chen" <fajunchen@gmail.com> wrote:

> Hi Folks,
> 
> I've noticed the following code in both pata_sil680.c and IDE code siimage.c
>         /* FIXME: double check */
> 	pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, (class_rev) ? 1 : 255);
> I was unable to find the recommended setting in Sil680 document. Could
> someone explain the rational behind the code above? Does it need to be
> adjusted on different processors for PCI read/write performance?

The code is inherited from the original bits by Andre Hedrick who had
access to the chip errata documents, which afaik have never been
published. 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale
  2007-02-22  0:04 ` Alan
@ 2007-02-22  1:21   ` Fajun Chen
  2007-02-22 18:52     ` Alan
  0 siblings, 1 reply; 7+ messages in thread
From: Fajun Chen @ 2007-02-22  1:21 UTC (permalink / raw)
  To: Alan; +Cc: linux-ide, Tejun Heo

On 2/21/07, Alan <alan@lxorguk.ukuu.org.uk> wrote:
> On Wed, 21 Feb 2007 15:56:28 -0700
> "Fajun Chen" <fajunchen@gmail.com> wrote:
>
> > Hi Folks,
> >
> > I've noticed the following code in both pata_sil680.c and IDE code siimage.c
> >         /* FIXME: double check */
> >       pci_write_config_byte(pdev, PCI_CACHE_LINE_SIZE, (class_rev) ? 1 : 255);
> > I was unable to find the recommended setting in Sil680 document. Could
> > someone explain the rational behind the code above? Does it need to be
> > adjusted on different processors for PCI read/write performance?
>
> The code is inherited from the original bits by Andre Hedrick who had
> access to the chip errata documents, which afaik have never been
> published.
>
Thanks for the update, Alan.

I did another experiments by changing Cache Line Size to 0x02 and
Latency Timer to 0x40 and the performance of PCI Read (Read DMA) has
been  almost doubled.  So this seems there may be room for further
performance enhancement by tweaking PCI configuration.  One concern
about the latency timer change is increased bus hold time, could this
delay other applications which shares the PCI bus?

Since Sil3124 has better PCI read/write performance, as a reference,
could someone explain or point me to the PCI configuration code for
Sil3124? I couldn't find it in sata_sil24.c.

The kernel version used is 2.6.18-rc2 and libata version is 2.00.

Thanks,
Fajun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale
  2007-02-22 18:52     ` Alan
@ 2007-02-22 18:18       ` Jeff Garzik
  2007-02-22 23:14         ` Fajun Chen
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2007-02-22 18:18 UTC (permalink / raw)
  To: Alan; +Cc: Fajun Chen, linux-ide, Tejun Heo

Alan wrote:
>> Since Sil3124 has better PCI read/write performance, as a reference,
>> could someone explain or point me to the PCI configuration code for
>> Sil3124? I couldn't find it in sata_sil24.c.
> 
> Are you sure the values used are not the power on ones in this case ?

The values used, most likely, are BIOS-programmed.

sata_sil24.c does not call pci_set_mwi(), which is the only code in the 
kernel (besides driver-specific, hand-coded stuff) that adjusts the PCI 
cacheline register value.

	Jeff




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale
  2007-02-22  1:21   ` Fajun Chen
@ 2007-02-22 18:52     ` Alan
  2007-02-22 18:18       ` Jeff Garzik
  0 siblings, 1 reply; 7+ messages in thread
From: Alan @ 2007-02-22 18:52 UTC (permalink / raw)
  To: Fajun Chen; +Cc: linux-ide, Tejun Heo

> Since Sil3124 has better PCI read/write performance, as a reference,
> could someone explain or point me to the PCI configuration code for
> Sil3124? I couldn't find it in sata_sil24.c.

Are you sure the values used are not the power on ones in this case ?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale
  2007-02-22 18:18       ` Jeff Garzik
@ 2007-02-22 23:14         ` Fajun Chen
  2007-02-22 23:23           ` Jeff Garzik
  0 siblings, 1 reply; 7+ messages in thread
From: Fajun Chen @ 2007-02-22 23:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Alan, linux-ide, Tejun Heo

On 2/22/07, Jeff Garzik <jeff@garzik.org> wrote:
> Alan wrote:
> >> Since Sil3124 has better PCI read/write performance, as a reference,
> >> could someone explain or point me to the PCI configuration code for
> >> Sil3124? I couldn't find it in sata_sil24.c.
> >
> > Are you sure the values used are not the power on ones in this case ?
>
> The values used, most likely, are BIOS-programmed.
>
> sata_sil24.c does not call pci_set_mwi(), which is the only code in the
> kernel (besides driver-specific, hand-coded stuff) that adjusts the PCI
> cacheline register value.
>

I traced the latency timer and cache line size setup in  SATA Sil3124.
Latency timer is set to 0x40 and cache line size is set to 0.  For
PATA Sil680, latency timer is set to 0 and cache line size is set to
1.  If Sil3124 is a good reference for performance, we probably should
set latency timer to 0x40 and cache line size to 0 or 8 for Sil680.
Both Sil3124 and Sil680 specs recommend using Read Multiple as a PCI
master, ARM XScale cache line is 32 bytes, so it is probably better to
set cache line size to 8 for ARM XScale.   I tested both
configurations(cache line of 0 vs 8) though, no big difference in IO
performance but both are much better than default configuration in
Sil680.

Thanks,
Fajun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale
  2007-02-22 23:14         ` Fajun Chen
@ 2007-02-22 23:23           ` Jeff Garzik
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff Garzik @ 2007-02-22 23:23 UTC (permalink / raw)
  To: Fajun Chen; +Cc: Alan, linux-ide, Tejun Heo

Fajun Chen wrote:
> On 2/22/07, Jeff Garzik <jeff@garzik.org> wrote:
>> Alan wrote:
>> >> Since Sil3124 has better PCI read/write performance, as a reference,
>> >> could someone explain or point me to the PCI configuration code for
>> >> Sil3124? I couldn't find it in sata_sil24.c.
>> >
>> > Are you sure the values used are not the power on ones in this case ?
>>
>> The values used, most likely, are BIOS-programmed.
>>
>> sata_sil24.c does not call pci_set_mwi(), which is the only code in the
>> kernel (besides driver-specific, hand-coded stuff) that adjusts the PCI
>> cacheline register value.
>>
> 
> I traced the latency timer and cache line size setup in  SATA Sil3124.
> Latency timer is set to 0x40 and cache line size is set to 0.  For
> PATA Sil680, latency timer is set to 0 and cache line size is set to
> 1.  If Sil3124 is a good reference for performance, we probably should
> set latency timer to 0x40 and cache line size to 0 or 8 for Sil680.
> Both Sil3124 and Sil680 specs recommend using Read Multiple as a PCI

If you are going to be using PCI transactions that operate on cacheline 
size-based quantities of data, you should definitely program the PCI 
config register to a non-zero value.


> master, ARM XScale cache line is 32 bytes, so it is probably better to
> set cache line size to 8 for ARM XScale.   I tested both
> configurations(cache line of 0 vs 8) though, no big difference in IO
> performance but both are much better than default configuration in
> Sil680.

Read pci_set_cacheline_size() in drivers/pci/pci.c for the proper method 
of programming.

	Jeff



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-02-22 23:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-21 22:56 Question about PATA Sil680 Cache Line Size and Performance Degradation on ARM XScale Fajun Chen
2007-02-22  0:04 ` Alan
2007-02-22  1:21   ` Fajun Chen
2007-02-22 18:52     ` Alan
2007-02-22 18:18       ` Jeff Garzik
2007-02-22 23:14         ` Fajun Chen
2007-02-22 23:23           ` Jeff Garzik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.