All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
@ 2009-04-21  1:41 David Brownell
  2009-04-21 10:02 ` Sergei Shtylyov
  0 siblings, 1 reply; 9+ messages in thread
From: David Brownell @ 2009-04-21  1:41 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: linux-ide-u79uwXL29TY76Z2rM5mHXA, DaVinci

From: David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>

Fix UDMA throughput bug:  tCYC averages t2CYCTYP/2, but the code
previously assumed it was the same as t2CYCTYP.  (That is, it was
using just one clock edge, not both.)  Move the table's type
declaration so it's adjacent to the table, making it more clear
what those numbers mean.

On one system this change increased throughput by almost 4x: UDMA/66
sometimes topped 23 MB/sec (on a drive known to do much better).  On
another system it was around a 10% win (UDMA/66 up to 7+ MB/sec).

The difference might be caused by the ratio between memory and IDE
clocks.  In the system with large speedup, this was exactly 2 (as a
workaround for a rev 1.1 silicon bug).  The other system used a more
standard ratio of 1.63 (and rev 2.1 silicon) ... clock domain synch
might have some issues, they're not unheard-of.

Signed-off-by: David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
---
 drivers/ide/palm_bk3710.c |   26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

--- a/drivers/ide/palm_bk3710.c
+++ b/drivers/ide/palm_bk3710.c
@@ -39,14 +39,6 @@
 /* Primary Control Offset */
 #define IDE_PALM_ATA_PRI_CTL_OFFSET 0x3F6
 
-/*
- * PalmChip 3710 IDE Controller UDMA timing structure Definition
- */
-struct palm_bk3710_udmatiming {
-	unsigned int rptime;	/* Ready to pause time  */
-	unsigned int cycletime;	/* Cycle Time           */
-};
-
 #define BK3710_BMICP		0x00
 #define BK3710_BMISP		0x02
 #define BK3710_BMIDTP		0x04
@@ -67,13 +59,19 @@ struct palm_bk3710_udmatiming {
 
 static unsigned ideclk_period; /* in nanoseconds */
 
+struct palm_bk3710_udmatiming {
+	unsigned int rptime;	/* tRP -- Ready to pause time (nsec) */
+	unsigned int cycletime;	/* tCYCTYP2/2 -- avg Cycle Time (nsec) */
+				/* tENV is always a minimum of 20 nsec */
+};
+
 static const struct palm_bk3710_udmatiming palm_bk3710_udmatimings[6] = {
-	{160, 240},		/* UDMA Mode 0 */
-	{125, 160},		/* UDMA Mode 1 */
-	{100, 120},		/* UDMA Mode 2 */
-	{100, 90},		/* UDMA Mode 3 */
-	{100, 60},		/* UDMA Mode 4 */
-	{85,  40},		/* UDMA Mode 5 */
+	{160, 240 / 2,},	/* UDMA Mode 0 */
+	{125, 160 / 2,},	/* UDMA Mode 1 */
+	{100, 120 / 2,},	/* UDMA Mode 2 */
+	{100, 90 / 2,},		/* UDMA Mode 3 */
+	{100, 60 / 2,},		/* UDMA Mode 4 */
+	{85,  40 / 2,},		/* UDMA Mode 5 */
 };
 
 static void palm_bk3710_setudmamode(void __iomem *base, unsigned int dev,

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
  2009-04-21  1:41 [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix David Brownell
@ 2009-04-21 10:02 ` Sergei Shtylyov
       [not found]   ` <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
  2009-04-22 18:26   ` Bartlomiej Zolnierkiewicz
  0 siblings, 2 replies; 9+ messages in thread
From: Sergei Shtylyov @ 2009-04-21 10:02 UTC (permalink / raw)
  To: David Brownell; +Cc: Bartlomiej Zolnierkiewicz, linux-ide, DaVinci

David Brownell wrote:

> Fix UDMA throughput bug:  tCYC averages t2CYCTYP/2, but the code
> previously assumed it was the same as t2CYCTYP.

    Wow, thanks for finding it!
    IIUC however, this should have only affected UDMA writes, not reads 
because on reads the device controls the strobe timing.

> (That is, it was using just one clock edge, not both.)

    There's no way only one clock edge could have been used since it would 
have resulted in CRC errors, so this comment is not to the point.

> On one system this change increased throughput by almost 4x: UDMA/66
> sometimes topped 23 MB/sec (on a drive known to do much better).  On
> another system it was around a 10% win (UDMA/66 up to 7+ MB/sec).

    It's interesting that on my DM6467 EVM UDMA/66 reads topped at about 29-30 
MB/s even without this patch (measuread with hdparm), and on DM6446 EVM they 
were only slightly slower...

> The difference might be caused by the ratio between memory and IDE
> clocks.  In the system with large speedup, this was exactly 2 (as a
> workaround for a rev 1.1 silicon bug).  The other system used a more
> standard ratio of 1.63 (and rev 2.1 silicon) ... clock domain synch
> might have some issues, they're not unheard-of.

    Interesting...

> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>

    The patch itself is:

Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

MBR, Sergei

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
       [not found]   ` <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
@ 2009-04-21 10:28     ` David Brownell
  0 siblings, 0 replies; 9+ messages in thread
From: David Brownell @ 2009-04-21 10:28 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: linux-ide-u79uwXL29TY76Z2rM5mHXA, DaVinci, Bartlomiej Zolnierkiewicz

On Tuesday 21 April 2009, Sergei Shtylyov wrote:
> David Brownell wrote:
> 
> > Fix UDMA throughput bug:  tCYC averages t2CYCTYP/2, but the code
> > previously assumed it was the same as t2CYCTYP.
> 
>     Wow, thanks for finding it!
>     IIUC however, this should have only affected UDMA writes, not reads 
> because on reads the device controls the strobe timing.

Odd, I certainly noticed UDMA *read* performance changes...

 
> > (That is, it was using just one clock edge, not both.)
> 
>     There's no way only one clock edge could have been used since it would 
> have resulted in CRC errors, so this comment is not to the point.

That's correct of course ... better to say instead that the
throughput is exactly half what it should have been, since
the clock was using the wrong period.


> > On one system this change increased throughput by almost 4x: UDMA/66
> > sometimes topped 23 MB/sec (on a drive known to do much better).  On
> > another system it was around a 10% win (UDMA/66 up to 7+ MB/sec).
> 
>     It's interesting that on my DM6467 EVM UDMA/66 reads topped at about 29-30 
> MB/s even without this patch (measuread with hdparm), and on DM6446 EVM they 
> were only slightly slower...

I've yet to see transfer rates that fast.  Even drives that have
topped 40 MB/sec on a PC don't get that speed.


> > The difference might be caused by the ratio between memory and IDE
> > clocks.  In the system with large speedup, this was exactly 2 (as a
> > workaround for a rev 1.1 silicon bug).  The other system used a more
> > standard ratio of 1.63 (and rev 2.1 silicon) ... clock domain synch
> > might have some issues, they're not unheard-of.
> 
>     Interesting...

Yeah, but not what I'd call conclusive.  The two drives in question
were both TravelStars ... one older 20 MByte, and a less old 60 MByte.
So i have no reason to think either would have trouble going "fast".

 
> > Signed-off-by: David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
> 
>     The patch itself is:
> 
> Acked-by: Sergei Shtylyov <sshtylyov-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
> 
> MBR, Sergei
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
  2009-04-21 10:02 ` Sergei Shtylyov
       [not found]   ` <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
@ 2009-04-22 18:26   ` Bartlomiej Zolnierkiewicz
  2009-04-22 20:32     ` David Brownell
       [not found]     ` <200904222026.11495.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 2 replies; 9+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-04-22 18:26 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: David Brownell, linux-ide, DaVinci

On Tuesday 21 April 2009 12:02:22 Sergei Shtylyov wrote:
> David Brownell wrote:
> 
> > Fix UDMA throughput bug:  tCYC averages t2CYCTYP/2, but the code
> > previously assumed it was the same as t2CYCTYP.
> 
>     Wow, thanks for finding it!
>     IIUC however, this should have only affected UDMA writes, not reads 
> because on reads the device controls the strobe timing.
> 
> > (That is, it was using just one clock edge, not both.)
> 
>     There's no way only one clock edge could have been used since it would 
> have resulted in CRC errors, so this comment is not to the point.
> 
> > On one system this change increased throughput by almost 4x: UDMA/66
> > sometimes topped 23 MB/sec (on a drive known to do much better).  On
> > another system it was around a 10% win (UDMA/66 up to 7+ MB/sec).
> 
>     It's interesting that on my DM6467 EVM UDMA/66 reads topped at about 29-30 
> MB/s even without this patch (measuread with hdparm), and on DM6446 EVM they 
> were only slightly slower...
> 
> > The difference might be caused by the ratio between memory and IDE
> > clocks.  In the system with large speedup, this was exactly 2 (as a
> > workaround for a rev 1.1 silicon bug).  The other system used a more
> > standard ratio of 1.63 (and rev 2.1 silicon) ... clock domain synch
> > might have some issues, they're not unheard-of.
> 
>     Interesting...
> 
> > Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
> 
>     The patch itself is:
> 
> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>

applied

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
  2009-04-22 18:26   ` Bartlomiej Zolnierkiewicz
@ 2009-04-22 20:32     ` David Brownell
       [not found]     ` <200904222026.11495.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  1 sibling, 0 replies; 9+ messages in thread
From: David Brownell @ 2009-04-22 20:32 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: Sergei Shtylyov, linux-ide, DaVinci

On Wednesday 22 April 2009, Bartlomiej Zolnierkiewicz wrote:
> On Tuesday 21 April 2009 12:02:22 Sergei Shtylyov wrote:
> > David Brownell wrote:
> > 
> > > Fix UDMA throughput bug:  tCYC averages t2CYCTYP/2, but the code
> > > previously assumed it was the same as t2CYCTYP.
> > 
> >     Wow, thanks for finding it!
> > ....
> > 
> > > Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
> > 
> >     The patch itself is:
> > 
> > Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
> 
> applied

Thanks.  Further experimentation suggests (a) doing timing
computations in picoseconds, rather than nanoseconds, gives
better answers and makes that "slow" system faster by 25% or
more; and (b) increasing all the UDMA timings by one clock
cycle gives another few percent.

Now (a) makes obvious sense; but (b) seems odd ...

- Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
       [not found]     ` <200904222026.11495.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2009-04-22 20:43       ` David Brownell
  2009-04-22 21:05         ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 9+ messages in thread
From: David Brownell @ 2009-04-22 20:43 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: linux-ide-u79uwXL29TY76Z2rM5mHXA, DaVinci

On Wednesday 22 April 2009, Bartlomiej Zolnierkiewicz wrote:
> applied

By the way ... what about the first patch,
which removed accesses to all those non-existent
registers and bitfields?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
  2009-04-22 20:43       ` David Brownell
@ 2009-04-22 21:05         ` Bartlomiej Zolnierkiewicz
       [not found]           ` <200904222305.54820.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-04-22 21:05 UTC (permalink / raw)
  To: David Brownell; +Cc: Sergei Shtylyov, linux-ide, DaVinci

On Wednesday 22 April 2009 22:43:14 David Brownell wrote:
> On Wednesday 22 April 2009, Bartlomiej Zolnierkiewicz wrote:
> > applied
> 
> By the way ... what about the first patch,
> which removed accesses to all those non-existent
> registers and bitfields?

I didn't see any discussion on it and it looked less urgent / more risky
(it is not uncommon for documentation to lack some data) than patch #2.

Should it also go upstream for 2.6.30?

Sergei?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
       [not found]           ` <200904222305.54820.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2009-04-22 22:29             ` David Brownell
  2009-04-23 20:48               ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 9+ messages in thread
From: David Brownell @ 2009-04-22 22:29 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: linux-ide-u79uwXL29TY76Z2rM5mHXA, DaVinci

On Wednesday 22 April 2009, Bartlomiej Zolnierkiewicz wrote:
> 
> > By the way ... what about the first patch,
> > which removed accesses to all those non-existent
> > registers and bitfields?
> 
> I didn't see any discussion on it and it looked less urgent / more risky
> (it is not uncommon for documentation to lack some data) than patch #2.

It's uncommon for TI's documentation to be that far off,
for that long, though.  "Lacking" docs for 50% of the
registers, for several years ... doesn't make sense.

The current reset handling is clearly broken:  the docs
are quite explicit that the controller doesn't drive the
reset signal, it's got to be done through a GPIO.  The
board designs match that part of the docs.  The code is
thus contrary to *all* other documentation.

Early DaVinci drivers sometimes exhibited a flagrant
disregard for chip documentation.  This driver is from
about that era.  Maybe it started from prototypes using
the a different controller design, for example.


> Should it also go upstream for 2.6.30?

I can't say either patch would be urgent for 2.6.30,
but of course it's good that bugfixes merge ASAP.

Since we're only at RC3, I'd be inclined to push both
up right now.  If we were at RC6 or so, I'd hold off
till the next merge window.

- Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 2.6.30-rc2 2/2] palm_bk3710:  UDMA performance fix
  2009-04-22 22:29             ` David Brownell
@ 2009-04-23 20:48               ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 9+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-04-23 20:48 UTC (permalink / raw)
  To: David Brownell; +Cc: Sergei Shtylyov, linux-ide, DaVinci

On Thursday 23 April 2009 00:29:23 David Brownell wrote:
> On Wednesday 22 April 2009, Bartlomiej Zolnierkiewicz wrote:
> > 
> > > By the way ... what about the first patch,
> > > which removed accesses to all those non-existent
> > > registers and bitfields?
> > 
> > I didn't see any discussion on it and it looked less urgent / more risky
> > (it is not uncommon for documentation to lack some data) than patch #2.
> 
> It's uncommon for TI's documentation to be that far off,
> for that long, though.  "Lacking" docs for 50% of the
> registers, for several years ... doesn't make sense.
> 
> The current reset handling is clearly broken:  the docs
> are quite explicit that the controller doesn't drive the
> reset signal, it's got to be done through a GPIO.  The
> board designs match that part of the docs.  The code is
> thus contrary to *all* other documentation.
> 
> Early DaVinci drivers sometimes exhibited a flagrant
> disregard for chip documentation.  This driver is from
> about that era.  Maybe it started from prototypes using
> the a different controller design, for example.
> 
> 
> > Should it also go upstream for 2.6.30?
> 
> I can't say either patch would be urgent for 2.6.30,
> but of course it's good that bugfixes merge ASAP.
> 
> Since we're only at RC3, I'd be inclined to push both
> up right now.  If we were at RC6 or so, I'd hold off
> till the next merge window.

Thanks for the detailed explanation.

I applied the patch to ide-2.6.git/for-linus.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-04-23 20:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-21  1:41 [patch 2.6.30-rc2 2/2] palm_bk3710: UDMA performance fix David Brownell
2009-04-21 10:02 ` Sergei Shtylyov
     [not found]   ` <49ED99AE.3020100-hkdhdckH98+B+jHODAdFcQ@public.gmane.org>
2009-04-21 10:28     ` David Brownell
2009-04-22 18:26   ` Bartlomiej Zolnierkiewicz
2009-04-22 20:32     ` David Brownell
     [not found]     ` <200904222026.11495.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-22 20:43       ` David Brownell
2009-04-22 21:05         ` Bartlomiej Zolnierkiewicz
     [not found]           ` <200904222305.54820.bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-04-22 22:29             ` David Brownell
2009-04-23 20:48               ` Bartlomiej Zolnierkiewicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.