* [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix
@ 2009-04-21 1:41 David Brownell
2009-04-21 10:02 ` Sergei Shtylyov
0 siblings, 1 reply; 9+ messages in thread
From: David Brownell @ 2009-04-21 1:41 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz; +Cc: linux-ide-u79uwXL29TY76Z2rM5mHXA, DaVinci
From: David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Fix UDMA throughput bug: tCYC averages t2CYCTYP/2, but the code
previously assumed it was the same as t2CYCTYP. (That is, it was
using just one clock edge, not both.) Move the table's type
declaration so it's adjacent to the table, making it more clear
what those numbers mean.
On one system this change increased throughput by almost 4x: UDMA/66
sometimes topped 23 MB/sec (on a drive known to do much better). On
another system it was around a 10% win (UDMA/66 up to 7+ MB/sec).
The difference might be caused by the ratio between memory and IDE
clocks. In the system with large speedup, this was exactly 2 (as a
workaround for a rev 1.1 silicon bug). The other system used a more
standard ratio of 1.63 (and rev 2.1 silicon) ... clock domain synch
might have some issues, they're not unheard-of.
Signed-off-by: David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
---
drivers/ide/palm_bk3710.c | 26 ++++++++++++--------------
1 file changed, 12 insertions(+), 14 deletions(-)
--- a/drivers/ide/palm_bk3710.c
+++ b/drivers/ide/palm_bk3710.c
@@ -39,14 +39,6 @@
/* Primary Control Offset */
#define IDE_PALM_ATA_PRI_CTL_OFFSET 0x3F6
-/*
- * PalmChip 3710 IDE Controller UDMA timing structure Definition
- */
-struct palm_bk3710_udmatiming {
- unsigned int rptime; /* Ready to pause time */
- unsigned int cycletime; /* Cycle Time */
-};
-
#define BK3710_BMICP 0x00
#define BK3710_BMISP 0x02
#define BK3710_BMIDTP 0x04
@@ -67,13 +59,19 @@ struct palm_bk3710_udmatiming {
static unsigned ideclk_period; /* in nanoseconds */
+struct palm_bk3710_udmatiming {
+ unsigned int rptime; /* tRP -- Ready to pause time (nsec) */
+ unsigned int cycletime; /* tCYCTYP2/2 -- avg Cycle Time (nsec) */
+ /* tENV is always a minimum of 20 nsec */
+};
+
static const struct palm_bk3710_udmatiming palm_bk3710_udmatimings[6] = {
- {160, 240}, /* UDMA Mode 0 */
- {125, 160}, /* UDMA Mode 1 */
- {100, 120}, /* UDMA Mode 2 */
- {100, 90}, /* UDMA Mode 3 */
- {100, 60}, /* UDMA Mode 4 */
- {85, 40}, /* UDMA Mode 5 */
+ {160, 240 / 2,}, /* UDMA Mode 0 */
+ {125, 160 / 2,}, /* UDMA Mode 1 */
+ {100, 120 / 2,}, /* UDMA Mode 2 */
+ {100, 90 / 2,}, /* UDMA Mode 3 */
+ {100, 60 / 2,}, /* UDMA Mode 4 */
+ {85, 40 / 2,}, /* UDMA Mode 5 */
};
static void palm_bk3710_setudmamode(void __iomem *base, unsigned int dev,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix
2009-04-21 1:41 [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix David Brownell
@ 2009-04-21 10:02 ` Sergei Shtylyov
[not found] ` <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
2009-04-22 18:26 ` Bartlomiej Zolnierkiewicz
0 siblings, 2 replies; 9+ messages in thread
From: Sergei Shtylyov @ 2009-04-21 10:02 UTC (permalink / raw)
To: David Brownell; +Cc: Bartlomiej Zolnierkiewicz, linux-ide, DaVinci
David Brownell wrote:
> Fix UDMA throughput bug: tCYC averages t2CYCTYP/2, but the code
> previously assumed it was the same as t2CYCTYP.
Wow, thanks for finding it!
IIUC however, this should have only affected UDMA writes, not reads
because on reads the device controls the strobe timing.
> (That is, it was using just one clock edge, not both.)
There's no way only one clock edge could have been used since it would
have resulted in CRC errors, so this comment is not to the point.
> On one system this change increased throughput by almost 4x: UDMA/66
> sometimes topped 23 MB/sec (on a drive known to do much better). On
> another system it was around a 10% win (UDMA/66 up to 7+ MB/sec).
It's interesting that on my DM6467 EVM UDMA/66 reads topped at about 29-30
MB/s even without this patch (measuread with hdparm), and on DM6446 EVM they
were only slightly slower...
> The difference might be caused by the ratio between memory and IDE
> clocks. In the system with large speedup, this was exactly 2 (as a
> workaround for a rev 1.1 silicon bug). The other system used a more
> standard ratio of 1.63 (and rev 2.1 silicon) ... clock domain synch
> might have some issues, they're not unheard-of.
Interesting...
> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
The patch itself is:
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
MBR, Sergei
^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>]
* Re: [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix
[not found] ` <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
@ 2009-04-21 10:28 ` David Brownell
0 siblings, 0 replies; 9+ messages in thread
From: David Brownell @ 2009-04-21 10:28 UTC (permalink / raw)
To: Sergei Shtylyov
Cc: linux-ide-u79uwXL29TY76Z2rM5mHXA, DaVinci, Bartlomiej Zolnierkiewicz
On Tuesday 21 April 2009, Sergei Shtylyov wrote:
> David Brownell wrote:
>
> > Fix UDMA throughput bug: tCYC averages t2CYCTYP/2, but the code
> > previously assumed it was the same as t2CYCTYP.
>
> Wow, thanks for finding it!
> IIUC however, this should have only affected UDMA writes, not reads
> because on reads the device controls the strobe timing.
Odd, I certainly noticed UDMA *read* performance changes...
> > (That is, it was using just one clock edge, not both.)
>
> There's no way only one clock edge could have been used since it would
> have resulted in CRC errors, so this comment is not to the point.
That's correct of course ... better to say instead that the
throughput is exactly half what it should have been, since
the clock was using the wrong period.
> > On one system this change increased throughput by almost 4x: UDMA/66
> > sometimes topped 23 MB/sec (on a drive known to do much better). On
> > another system it was around a 10% win (UDMA/66 up to 7+ MB/sec).
>
> It's interesting that on my DM6467 EVM UDMA/66 reads topped at about 29-30
> MB/s even without this patch (measuread with hdparm), and on DM6446 EVM they
> were only slightly slower...
I've yet to see transfer rates that fast. Even drives that have
topped 40 MB/sec on a PC don't get that speed.
> > The difference might be caused by the ratio between memory and IDE
> > clocks. In the system with large speedup, this was exactly 2 (as a
> > workaround for a rev 1.1 silicon bug). The other system used a more
> > standard ratio of 1.63 (and rev 2.1 silicon) ... clock domain synch
> > might have some issues, they're not unheard-of.
>
> Interesting...
Yeah, but not what I'd call conclusive. The two drives in question
were both TravelStars ... one older 20 MByte, and a less old 60 MByte.
So i have no reason to think either would have trouble going "fast".
> > Signed-off-by: David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
>
> The patch itself is:
>
> Acked-by: Sergei Shtylyov <sshtylyov-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
>
> MBR, Sergei
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix
2009-04-21 10:02 ` Sergei Shtylyov
[not found] ` <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
@ 2009-04-22 18:26 ` Bartlomiej Zolnierkiewicz
2009-04-22 20:32 ` David Brownell
[not found] ` <200904222026.11495.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
1 sibling, 2 replies; 9+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-04-22 18:26 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: David Brownell, linux-ide, DaVinci
On Tuesday 21 April 2009 12:02:22 Sergei Shtylyov wrote:
> David Brownell wrote:
>
> > Fix UDMA throughput bug: tCYC averages t2CYCTYP/2, but the code
> > previously assumed it was the same as t2CYCTYP.
>
> Wow, thanks for finding it!
> IIUC however, this should have only affected UDMA writes, not reads
> because on reads the device controls the strobe timing.
>
> > (That is, it was using just one clock edge, not both.)
>
> There's no way only one clock edge could have been used since it would
> have resulted in CRC errors, so this comment is not to the point.
>
> > On one system this change increased throughput by almost 4x: UDMA/66
> > sometimes topped 23 MB/sec (on a drive known to do much better). On
> > another system it was around a 10% win (UDMA/66 up to 7+ MB/sec).
>
> It's interesting that on my DM6467 EVM UDMA/66 reads topped at about 29-30
> MB/s even without this patch (measuread with hdparm), and on DM6446 EVM they
> were only slightly slower...
>
> > The difference might be caused by the ratio between memory and IDE
> > clocks. In the system with large speedup, this was exactly 2 (as a
> > workaround for a rev 1.1 silicon bug). The other system used a more
> > standard ratio of 1.63 (and rev 2.1 silicon) ... clock domain synch
> > might have some issues, they're not unheard-of.
>
> Interesting...
>
> > Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
>
> The patch itself is:
>
> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
applied
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix
2009-04-22 18:26 ` Bartlomiej Zolnierkiewicz
@ 2009-04-22 20:32 ` David Brownell
[not found] ` <200904222026.11495.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
1 sibling, 0 replies; 9+ messages in thread
From: David Brownell @ 2009-04-22 20:32 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz; +Cc: Sergei Shtylyov, linux-ide, DaVinci
On Wednesday 22 April 2009, Bartlomiej Zolnierkiewicz wrote:
> On Tuesday 21 April 2009 12:02:22 Sergei Shtylyov wrote:
> > David Brownell wrote:
> >
> > > Fix UDMA throughput bug: tCYC averages t2CYCTYP/2, but the code
> > > previously assumed it was the same as t2CYCTYP.
> >
> > Wow, thanks for finding it!
> > ....
> >
> > > Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
> >
> > The patch itself is:
> >
> > Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
>
> applied
Thanks. Further experimentation suggests (a) doing timing
computations in picoseconds, rather than nanoseconds, gives
better answers and makes that "slow" system faster by 25% or
more; and (b) increasing all the UDMA timings by one clock
cycle gives another few percent.
Now (a) makes obvious sense; but (b) seems odd ...
- Dave
^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <200904222026.11495.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
end of thread, other threads:[~2009-04-23 20:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-21 1:41 [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix David Brownell
2009-04-21 10:02 ` Sergei Shtylyov
[not found] ` <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
2009-04-21 10:28 ` David Brownell
2009-04-22 18:26 ` Bartlomiej Zolnierkiewicz
2009-04-22 20:32 ` David Brownell
[not found] ` <200904222026.11495.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-22 20:43 ` David Brownell
2009-04-22 21:05 ` Bartlomiej Zolnierkiewicz
[not found] ` <200904222305.54820.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-22 22:29 ` David Brownell
2009-04-23 20:48 ` Bartlomiej Zolnierkiewicz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.