All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
@ 2010-06-01 11:36 Delio Brignoli
  2010-06-01 13:51 ` Ben Gardiner
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Delio Brignoli @ 2010-06-01 11:36 UTC (permalink / raw)
  To: u-boot

I have updated this patch based on the comments [1] by Wolfgang Denk and 
removed unused variables.
[1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]

Reduce the number of reads per byte transferred on the BUF register from 2 to 1 and
take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM, 
SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a 2Mbyte image file.
Remove unused variables in the spi_xfer() function.

Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>
---
 drivers/spi/davinci_spi.c |   69 ++++++++++++++++++++++-----------------------
 1 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/drivers/spi/davinci_spi.c b/drivers/spi/davinci_spi.c
index 60ba007..d7f130d 100644
--- a/drivers/spi/davinci_spi.c
+++ b/drivers/spi/davinci_spi.c
@@ -131,12 +131,10 @@ int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
 {
 	struct davinci_spi_slave *ds = to_davinci_spi(slave);
 	unsigned int	len, data1_reg_val = readl(&ds->regs->dat1);
-	int		ret, i;
+	unsigned int	i_cnt = 0, o_cnt = 0, buf_reg_val;
 	const u8	*txp = dout; /* dout can be NULL for read operation */
 	u8		*rxp = din;  /* din can be NULL for write operation */
 
-	ret = 0;
-
 	if (bitlen == 0)
 		/* Finish any previously submitted transfers */
 		goto out;
@@ -159,41 +157,42 @@ int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
 	readl(&ds->regs->buf);
 
 	/* keep writing and reading 1 byte until done */
-	for (i = 0; i < len; i++) {
-		/* wait till TXFULL is asserted */
-		while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK);
-
-		/* write the data */
-		data1_reg_val &= ~0xFFFF;
-		if (txp) {
-			data1_reg_val |= *txp;
-			txp++;
+	while ((i_cnt < len) || (o_cnt < len)) {
+		/* read RX buffer and flags */
+		buf_reg_val = readl(&ds->regs->buf);
+
+		/* if data is available */
+		if ((i_cnt < len) && (buf_reg_val & SPIBUF_RXEMPTY_MASK) == 0) {
+			/* if there is no read buffer simply ignore the read character */
+			if (rxp)
+				*rxp++ = buf_reg_val & 0xFF;
+			/* increment read words count */
+			i_cnt++;
 		}
 
-		/*
-		 * Write to DAT1 is required to keep the serial transfer going.
-		 * We just terminate when we reach the end.
-		 */
-		if ((i == (len - 1)) && (flags & SPI_XFER_END)) {
-			/* clear CS hold */
-			writel(data1_reg_val &
-				~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1);
-		} else {
-			/* enable CS hold */
-			data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
+		/* if the tx buffer is empty and there is still data to transmit */
+		if ((o_cnt < len) && ((buf_reg_val & SPIBUF_TXFULL_MASK) == 0)) {
+			/* write the data */
+			data1_reg_val &= ~0xFFFF;
+			if (txp)
+				data1_reg_val |= *txp++;
+			/*
+			 * Write to DAT1 is required to keep the serial transfer going.
+			 * We just terminate when we reach the end.
+			 */
+			if ((o_cnt == (len - 1)) && (flags & SPI_XFER_END)) {
+				/* clear CS hold */
+				writel(data1_reg_val &
+						~(1 << SPIDAT1_CSHOLD_SHIFT),
+						&ds->regs->dat1);
+			} else {
+				/* enable CS hold and write TX register */
+				data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
 					(slave->cs << SPIDAT1_CSNR_SHIFT));
-			writel(data1_reg_val, &ds->regs->dat1);
-		}
-
-		/* read the data - wait for data availability */
-		while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK);
-
-		if (rxp) {
-			*rxp = readl(&ds->regs->buf) & 0xFF;
-			rxp++;
-		} else {
-			/* simply drop the read character */
-			readl(&ds->regs->buf);
+				writel(data1_reg_val, &ds->regs->dat1);
+			}
+			/* increment written words count */
+			o_cnt++;
 		}
 	}
 	return 0;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-01 11:36 [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed Delio Brignoli
@ 2010-06-01 13:51 ` Ben Gardiner
  2010-06-07 19:30 ` Paulraj, Sandeep
  2010-06-17 15:02 ` Nick Thompson
  2 siblings, 0 replies; 13+ messages in thread
From: Ben Gardiner @ 2010-06-01 13:51 UTC (permalink / raw)
  To: u-boot

On Tue, Jun 1, 2010 at 7:36 AM, Delio Brignoli
<dbrignoli@audioscience.com> wrote:
> I have updated this patch based on the comments [1] by Wolfgang Denk and
> removed unused variables.
> [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
>
> Reduce the number of reads per byte transferred on the BUF register from 2 to 1 and
> take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM,
> SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a 2Mbyte image file.
> Remove unused variables in the spi_xfer() function.
>
> Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
> Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>

Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>

Applies cleanly to HEAD (9bb3b3d4406c1e388a99f6fb189147d6a06cc2cf) of
git://git.denx.de/u-boot.git and HEAD
(5f16b8551b125f16cd8d58f278cb25b94272fd9f) of
git://arago-project.org/git/people/sekhar/u-boot-omapl1.git .

I am unable to build an SPI-capable u-boot for the da830 or da850 on
git://git.denx.de/u-boot.git; tested with HEAD of
git://arago-project.org/git/people/sekhar/u-boot-omapl1.git -- there
is a noticeable speed improvement in SPI read speed.

-- 
Ben Gardiner
Nanometrics Inc.
+1 (613) 592-6776 x239
http://www.nanometrics.ca

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-01 11:36 [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed Delio Brignoli
  2010-06-01 13:51 ` Ben Gardiner
@ 2010-06-07 19:30 ` Paulraj, Sandeep
  2010-06-07 21:17   ` Paulraj, Sandeep
  2010-06-17 15:02 ` Nick Thompson
  2 siblings, 1 reply; 13+ messages in thread
From: Paulraj, Sandeep @ 2010-06-07 19:30 UTC (permalink / raw)
  To: u-boot



> 
> I have updated this patch based on the comments [1] by Wolfgang Denk and
> removed unused variables.
> [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
> 
> Reduce the number of reads per byte transferred on the BUF register from 2
> to 1 and
> take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138
> EVM,
> SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a
> 2Mbyte image file.
> Remove unused variables in the spi_xfer() function.
> 
> Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
> Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>
> ---


Checkpatch returned 4 errors.

Please fix and resubmit

Thanks,
Sandeep

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-07 19:30 ` Paulraj, Sandeep
@ 2010-06-07 21:17   ` Paulraj, Sandeep
  2010-06-08  8:36     ` Delio Brignoli
  0 siblings, 1 reply; 13+ messages in thread
From: Paulraj, Sandeep @ 2010-06-07 21:17 UTC (permalink / raw)
  To: u-boot


> >
> > I have updated this patch based on the comments [1] by Wolfgang Denk and
> > removed unused variables.
> > [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
> >
> > Reduce the number of reads per byte transferred on the BUF register from
> 2
> > to 1 and
> > take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138
> > EVM,
> > SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with
> a
> > 2Mbyte image file.
> > Remove unused variables in the spi_xfer() function.
> >
> > Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
> > Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>
> > ---
> 
> 
> Checkpatch returned 4 errors.
> 
> Please fix and resubmit
> 
> Thanks,
> Sandeep

I fixed them myself.

Pushed to u-boot-ti

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-07 21:17   ` Paulraj, Sandeep
@ 2010-06-08  8:36     ` Delio Brignoli
  0 siblings, 0 replies; 13+ messages in thread
From: Delio Brignoli @ 2010-06-08  8:36 UTC (permalink / raw)
  To: u-boot

Thank you Sandeep.
--
Delio

On 07/06/2010, at 23:17, Paulraj, Sandeep wrote:
>> Checkpatch returned 4 errors.
>> 
>> Please fix and resubmit
>> 
>> Thanks,
>> Sandeep
> 
> I fixed them myself.
> 
> Pushed to u-boot-ti

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-01 11:36 [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed Delio Brignoli
  2010-06-01 13:51 ` Ben Gardiner
  2010-06-07 19:30 ` Paulraj, Sandeep
@ 2010-06-17 15:02 ` Nick Thompson
  2010-06-17 15:10   ` Paulraj, Sandeep
  2010-06-17 17:38   ` Delio Brignoli
  2 siblings, 2 replies; 13+ messages in thread
From: Nick Thompson @ 2010-06-17 15:02 UTC (permalink / raw)
  To: u-boot

On 01/06/10 12:36, Delio Brignoli wrote:
> I have updated this patch based on the comments [1] by Wolfgang Denk and 
> removed unused variables.
> [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
> 
> Reduce the number of reads per byte transferred on the BUF register from 2 to 1 and
> take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM, 
> SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a 2Mbyte image file.
> Remove unused variables in the spi_xfer() function.
> 
> Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
> Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>

Sorry, I'm a bit late to the party on this.

I have an alternative patch that tries to be even quicker, but I
don't have the same platform as Delio, so can't compare like with
like.

This diff applies before Delio's patch. If it is any faster I am
prepared to create a proper patch. If nobody can test it for speed
I'll probably just drop it, since it produces a slightly bigger
executable and I don't know that it is actually any faster...

In essence, it splits up read and write operations to avoid testing
pointers in every loop iteration. It also unrolls the last iteration
so that it doesn't have to test for loop ending twice each time
round. Finally it avoids bit setting/clearing on each iteration when
the results would only turn out to be the same anyway.

Here's the diff:

diff --git a/drivers/spi/davinci_spi.c b/drivers/spi/davinci_spi.c
index 60ba007..a90d2f4 100644
--- a/drivers/spi/davinci_spi.c
+++ b/drivers/spi/davinci_spi.c
@@ -126,16 +126,98 @@ void spi_release_bus(struct spi_slave *slave)
 	writel(SPIGCR0_SPIRST_MASK, &ds->regs->gcr0);
 }
 
-int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
-		const void *dout, void *din, unsigned long flags)
+static inline u8 davinci_spi_read_data(struct davinci_spi_slave *ds, u32 data)
+{
+	unsigned int	buf_reg_val;
+	/* wait till TXFULL is deasserted */
+	while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK)
+		;
+	writel(data, &ds->regs->dat1);
+
+	/* read the data - wait for data availability */
+	while ((buf_reg_val = readl(&ds->regs->buf)) & SPIBUF_RXEMPTY_MASK)
+		;
+	return buf_reg_val & 0xFF;
+}
+
+static int davinci_spi_read(struct spi_slave *slave, unsigned int len,
+			    u8 *rxp, unsigned long flags)
 {
 	struct davinci_spi_slave *ds = to_davinci_spi(slave);
-	unsigned int	len, data1_reg_val = readl(&ds->regs->dat1);
-	int		ret, i;
-	const u8	*txp = dout; /* dout can be NULL for read operation */
-	u8		*rxp = din;  /* din can be NULL for write operation */
+	unsigned int data1_reg_val = readl(&ds->regs->dat1);
+
+	/* do an empty read to clear the current contents */
+	(void)readl(&ds->regs->buf);
+
+	/* enable CS hold */
+	data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
+			(slave->cs << SPIDAT1_CSNR_SHIFT));
+	data1_reg_val &= ~0xFFFF;
+
+	/* keep writing and reading 1 byte until only 1 byte left to read */
+	while ((len--) > 1) {
+		*rxp++ = davinci_spi_read_data(ds, data1_reg_val);
+	}
 
-	ret = 0;
+	/*
+	 * clear CS hold when we reach the end.
+	 */
+	if (flags & SPI_XFER_END)
+		data1_reg_val &= ~(1 << SPIDAT1_CSHOLD_SHIFT);
+
+	*rxp = davinci_spi_read_data(ds, data1_reg_val);
+
+	return 0;
+}
+
+static inline void davinci_spi_write_data(struct davinci_spi_slave *ds, u32 data)
+{
+	/* wait till TXFULL is deasserted */
+	while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK)
+		;
+	writel(data, &ds->regs->dat1);
+
+	/* wait for read data availability */
+	while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK)
+		;
+}
+
+static int davinci_spi_write(struct spi_slave *slave, unsigned int len,
+		const u8 *txp, unsigned long flags)
+{
+	struct davinci_spi_slave *ds = to_davinci_spi(slave);
+	unsigned int data1_reg_val = readl(&ds->regs->dat1);
+
+	/* do an empty read to clear the current contents */
+	(void)readl(&ds->regs->buf);
+
+	/* enable CS hold */
+	data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
+			(slave->cs << SPIDAT1_CSNR_SHIFT));
+	data1_reg_val &= ~0xFFFF;
+
+	/* keep writing and reading 1 byte until only 1 byte left to write */
+	while ((len--) > 1) {
+		/* write the data */
+		davinci_spi_write_data(ds, data1_reg_val | *txp++);
+	}
+
+	/*
+	 * clear CS hold when we reach the end.
+	 */
+	if (flags & SPI_XFER_END)
+		data1_reg_val &= ~(1 << SPIDAT1_CSHOLD_SHIFT);
+
+	/* write the data */
+	davinci_spi_write_data(ds, data1_reg_val | *txp);
+
+	return 0;
+}
+
+int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
+		const void *dout, void *din, unsigned long flags)
+{
+	unsigned int len;
 
 	if (bitlen == 0)
 		/* Finish any previously submitted transfers */
@@ -155,53 +237,15 @@ int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
 
 	len = bitlen / 8;
 
-	/* do an empty read to clear the current contents */
-	readl(&ds->regs->buf);
-
-	/* keep writing and reading 1 byte until done */
-	for (i = 0; i < len; i++) {
-		/* wait till TXFULL is asserted */
-		while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK);
-
-		/* write the data */
-		data1_reg_val &= ~0xFFFF;
-		if (txp) {
-			data1_reg_val |= *txp;
-			txp++;
-		}
-
-		/*
-		 * Write to DAT1 is required to keep the serial transfer going.
-		 * We just terminate when we reach the end.
-		 */
-		if ((i == (len - 1)) && (flags & SPI_XFER_END)) {
-			/* clear CS hold */
-			writel(data1_reg_val &
-				~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1);
-		} else {
-			/* enable CS hold */
-			data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
-					(slave->cs << SPIDAT1_CSNR_SHIFT));
-			writel(data1_reg_val, &ds->regs->dat1);
-		}
-
-		/* read the data - wait for data availability */
-		while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK);
-
-		if (rxp) {
-			*rxp = readl(&ds->regs->buf) & 0xFF;
-			rxp++;
-		} else {
-			/* simply drop the read character */
-			readl(&ds->regs->buf);
-		}
-	}
-	return 0;
+	if (din)
+		return davinci_spi_read(slave, len, din, flags);
+	else
+		return davinci_spi_write(slave, len, dout, flags);
 
 out:
 	if (flags & SPI_XFER_END) {
-		writel(data1_reg_val &
-			~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1);
+		u8 dummy = 0;
+		davinci_spi_write(slave, 1, &dummy, flags);
 	}
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-17 15:02 ` Nick Thompson
@ 2010-06-17 15:10   ` Paulraj, Sandeep
  2010-06-17 16:14     ` Nick Thompson
  2010-06-17 17:38   ` Delio Brignoli
  1 sibling, 1 reply; 13+ messages in thread
From: Paulraj, Sandeep @ 2010-06-17 15:10 UTC (permalink / raw)
  To: u-boot



> 
> On 01/06/10 12:36, Delio Brignoli wrote:
> > I have updated this patch based on the comments [1] by Wolfgang Denk and
> > removed unused variables.
> > [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
> >
> > Reduce the number of reads per byte transferred on the BUF register from
> 2 to 1 and
> > take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138
> EVM,
> > SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with
> a 2Mbyte image file.
> > Remove unused variables in the spi_xfer() function.
> >
> > Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
> > Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>
> 
> Sorry, I'm a bit late to the party on this.

It is late. Pull request already sent to Wolfgang
> 
> I have an alternative patch that tries to be even quicker, but I
> don't have the same platform as Delio, so can't compare like with
> like.

Compare it on your platform. I believe you have the OMAP L137.
And post the results.

> 
> This diff applies before Delio's patch. If it is any faster I am
> prepared to create a proper patch. If nobody can test it for speed
> I'll probably just drop it, since it produces a slightly bigger
> executable and I don't know that it is actually any faster...
> 
> In essence, it splits up read and write operations to avoid testing
> pointers in every loop iteration. It also unrolls the last iteration
> so that it doesn't have to test for loop ending twice each time
> round. Finally it avoids bit setting/clearing on each iteration when
> the results would only turn out to be the same anyway.
> 
> Here's the diff:
> 
> diff --git a/drivers/spi/davinci_spi.c b/drivers/spi/davinci_spi.c
> index 60ba007..a90d2f4 100644
> --- a/drivers/spi/davinci_spi.c
> +++ b/drivers/spi/davinci_spi.c
> @@ -126,16 +126,98 @@ void spi_release_bus(struct spi_slave *slave)
>  	writel(SPIGCR0_SPIRST_MASK, &ds->regs->gcr0);
>  }
> 
> -int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
> -		const void *dout, void *din, unsigned long flags)
> +static inline u8 davinci_spi_read_data(struct davinci_spi_slave *ds, u32
> data)
> +{
> +	unsigned int	buf_reg_val;
> +	/* wait till TXFULL is deasserted */
> +	while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK)
> +		;
> +	writel(data, &ds->regs->dat1);
> +
> +	/* read the data - wait for data availability */
> +	while ((buf_reg_val = readl(&ds->regs->buf)) & SPIBUF_RXEMPTY_MASK)
> +		;
> +	return buf_reg_val & 0xFF;
> +}
> +
> +static int davinci_spi_read(struct spi_slave *slave, unsigned int len,
> +			    u8 *rxp, unsigned long flags)
>  {
>  	struct davinci_spi_slave *ds = to_davinci_spi(slave);
> -	unsigned int	len, data1_reg_val = readl(&ds->regs->dat1);
> -	int		ret, i;
> -	const u8	*txp = dout; /* dout can be NULL for read operation */
> -	u8		*rxp = din;  /* din can be NULL for write operation */
> +	unsigned int data1_reg_val = readl(&ds->regs->dat1);
> +
> +	/* do an empty read to clear the current contents */
> +	(void)readl(&ds->regs->buf);
> +
> +	/* enable CS hold */
> +	data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
> +			(slave->cs << SPIDAT1_CSNR_SHIFT));
> +	data1_reg_val &= ~0xFFFF;
> +
> +	/* keep writing and reading 1 byte until only 1 byte left to read */
> +	while ((len--) > 1) {
> +		*rxp++ = davinci_spi_read_data(ds, data1_reg_val);
> +	}
> 
> -	ret = 0;
> +	/*
> +	 * clear CS hold when we reach the end.
> +	 */
> +	if (flags & SPI_XFER_END)
> +		data1_reg_val &= ~(1 << SPIDAT1_CSHOLD_SHIFT);
> +
> +	*rxp = davinci_spi_read_data(ds, data1_reg_val);
> +
> +	return 0;
> +}
> +
> +static inline void davinci_spi_write_data(struct davinci_spi_slave *ds,
> u32 data)
> +{
> +	/* wait till TXFULL is deasserted */
> +	while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK)
> +		;
> +	writel(data, &ds->regs->dat1);
> +
> +	/* wait for read data availability */
> +	while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK)
> +		;
> +}
> +
> +static int davinci_spi_write(struct spi_slave *slave, unsigned int len,
> +		const u8 *txp, unsigned long flags)
> +{
> +	struct davinci_spi_slave *ds = to_davinci_spi(slave);
> +	unsigned int data1_reg_val = readl(&ds->regs->dat1);
> +
> +	/* do an empty read to clear the current contents */
> +	(void)readl(&ds->regs->buf);
> +
> +	/* enable CS hold */
> +	data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
> +			(slave->cs << SPIDAT1_CSNR_SHIFT));
> +	data1_reg_val &= ~0xFFFF;
> +
> +	/* keep writing and reading 1 byte until only 1 byte left to write
> */
> +	while ((len--) > 1) {
> +		/* write the data */
> +		davinci_spi_write_data(ds, data1_reg_val | *txp++);
> +	}
> +
> +	/*
> +	 * clear CS hold when we reach the end.
> +	 */
> +	if (flags & SPI_XFER_END)
> +		data1_reg_val &= ~(1 << SPIDAT1_CSHOLD_SHIFT);
> +
> +	/* write the data */
> +	davinci_spi_write_data(ds, data1_reg_val | *txp);
> +
> +	return 0;
> +}
> +
> +int spi_xfer(struct spi_slave *slave, unsigned int bitlen,
> +		const void *dout, void *din, unsigned long flags)
> +{
> +	unsigned int len;
> 
>  	if (bitlen == 0)
>  		/* Finish any previously submitted transfers */
> @@ -155,53 +237,15 @@ int spi_xfer(struct spi_slave *slave, unsigned int
> bitlen,
> 
>  	len = bitlen / 8;
> 
> -	/* do an empty read to clear the current contents */
> -	readl(&ds->regs->buf);
> -
> -	/* keep writing and reading 1 byte until done */
> -	for (i = 0; i < len; i++) {
> -		/* wait till TXFULL is asserted */
> -		while (readl(&ds->regs->buf) & SPIBUF_TXFULL_MASK);
> -
> -		/* write the data */
> -		data1_reg_val &= ~0xFFFF;
> -		if (txp) {
> -			data1_reg_val |= *txp;
> -			txp++;
> -		}
> -
> -		/*
> -		 * Write to DAT1 is required to keep the serial transfer going.
> -		 * We just terminate when we reach the end.
> -		 */
> -		if ((i == (len - 1)) && (flags & SPI_XFER_END)) {
> -			/* clear CS hold */
> -			writel(data1_reg_val &
> -				~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1);
> -		} else {
> -			/* enable CS hold */
> -			data1_reg_val |= ((1 << SPIDAT1_CSHOLD_SHIFT) |
> -					(slave->cs << SPIDAT1_CSNR_SHIFT));
> -			writel(data1_reg_val, &ds->regs->dat1);
> -		}
> -
> -		/* read the data - wait for data availability */
> -		while (readl(&ds->regs->buf) & SPIBUF_RXEMPTY_MASK);
> -
> -		if (rxp) {
> -			*rxp = readl(&ds->regs->buf) & 0xFF;
> -			rxp++;
> -		} else {
> -			/* simply drop the read character */
> -			readl(&ds->regs->buf);
> -		}
> -	}
> -	return 0;
> +	if (din)
> +		return davinci_spi_read(slave, len, din, flags);
> +	else
> +		return davinci_spi_write(slave, len, dout, flags);
> 
>  out:
>  	if (flags & SPI_XFER_END) {
> -		writel(data1_reg_val &
> -			~(1 << SPIDAT1_CSHOLD_SHIFT), &ds->regs->dat1);
> +		u8 dummy = 0;
> +		davinci_spi_write(slave, 1, &dummy, flags);
>  	}
>  	return 0;
>  }
> _______________________________________________
> U-Boot mailing list
> U-Boot at lists.denx.de
> http://lists.denx.de/mailman/listinfo/u-boot

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-17 15:10   ` Paulraj, Sandeep
@ 2010-06-17 16:14     ` Nick Thompson
  0 siblings, 0 replies; 13+ messages in thread
From: Nick Thompson @ 2010-06-17 16:14 UTC (permalink / raw)
  To: u-boot

On 17/06/10 16:10, Paulraj, Sandeep wrote:
> 
> 
>>
>> On 01/06/10 12:36, Delio Brignoli wrote:
>>> I have updated this patch based on the comments [1] by Wolfgang Denk and
>>> removed unused variables.
>>> [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
>>>
>>> Reduce the number of reads per byte transferred on the BUF register from
>> 2 to 1 and
>>> take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138
>> EVM,
>>> SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with
>> a 2Mbyte image file.
>>> Remove unused variables in the spi_xfer() function.
>>>
>>> Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>
>>> Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>
>>
>> Sorry, I'm a bit late to the party on this.
> 
> It is late. Pull request already sent to Wolfgang
>>
>> I have an alternative patch that tries to be even quicker, but I
>> don't have the same platform as Delio, so can't compare like with
>> like.
> 
> Compare it on your platform. I believe you have the OMAP L137.
> And post the results.

I don't have a scope to get an accurate measure. The best I can do
right now is use a serial snooper to time between me pressing return
and the next prompt turning up.

To try and drown out inaccuracies and delays, I ran:

	sf read 0xc0008000 0 0x800000

So 8MiB in a reasonably consistent 5.62 - 5.63 seconds, which is about
1.49MiB/s by my reckoning. A bit faster, but way short of 6.25MiB/s.

Nick.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-17 15:02 ` Nick Thompson
  2010-06-17 15:10   ` Paulraj, Sandeep
@ 2010-06-17 17:38   ` Delio Brignoli
  2010-06-18  8:26     ` Nick Thompson
  1 sibling, 1 reply; 13+ messages in thread
From: Delio Brignoli @ 2010-06-17 17:38 UTC (permalink / raw)
  To: u-boot

Hello Nick,

On 17/06/2010, at 17:02, Nick Thompson wrote:
> On 01/06/10 12:36, Delio Brignoli wrote:
>> I have updated this patch based on the comments [1] by Wolfgang Denk and 
>> removed unused variables.
>> [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
[...]
> I have an alternative patch that tries to be even quicker, but I
> don't have the same platform as Delio, so can't compare like with
> like.

I will test your patch on my L138 and let you know.

> In essence, it splits up read and write operations to avoid testing
> pointers in every loop iteration. It also unrolls the last iteration
> so that it doesn't have to test for loop ending twice each time
> round. Finally it avoids bit setting/clearing on each iteration when
> the results would only turn out to be the same anyway.

I am wondering if it is OK to split reading and writing: when I wrote
the patch I assumed that spi_xfer() was supposed to be able to read and
write to two distinct buffers at the same time. As I understand it, your
patch allows spi_xfer() to be either a read or a write operation but not both 
and I do not know if this semantic change is acceptable. Maybe Sekhar or 
Sandeep can clarify this.

Thanks
--
Delio

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-17 17:38   ` Delio Brignoli
@ 2010-06-18  8:26     ` Nick Thompson
  2010-06-18  8:49       ` Wolfgang Wegner
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Thompson @ 2010-06-18  8:26 UTC (permalink / raw)
  To: u-boot

On 17/06/10 18:38, Delio Brignoli wrote:
> Hello Nick,
> 
> On 17/06/2010, at 17:02, Nick Thompson wrote:
>> On 01/06/10 12:36, Delio Brignoli wrote:
>>> I have updated this patch based on the comments [1] by Wolfgang Denk and 
>>> removed unused variables.
>>> [1][http://lists.denx.de/pipermail/u-boot/2010-May/071728.html]
> [...]
>> I have an alternative patch that tries to be even quicker, but I
>> don't have the same platform as Delio, so can't compare like with
>> like.
> 
> I will test your patch on my L138 and let you know.

That would be great, thanks - I'm using a SPANSION FLASH, though I
don't think that makes any difference. It your results are consistent
I will be more confident with my L137 platform as a comparison.

> 
>> In essence, it splits up read and write operations to avoid testing
>> pointers in every loop iteration. It also unrolls the last iteration
>> so that it doesn't have to test for loop ending twice each time
>> round. Finally it avoids bit setting/clearing on each iteration when
>> the results would only turn out to be the same anyway.
> 
> I am wondering if it is OK to split reading and writing: when I wrote
> the patch I assumed that spi_xfer() was supposed to be able to read and
> write to two distinct buffers at the same time. As I understand it, your
> patch allows spi_xfer() to be either a read or a write operation but not both 
> and I do not know if this semantic change is acceptable. Maybe Sekhar or 
> Sandeep can clarify this.

I wondered about that too. I don't think it matters for SPI FLASH, but I
guess it could for other SPI devices (Audio?).

I was thinking I could add a third routine 'read_write' that does both,
which only gets called if a bi-direction transfer is requested. This
would make the code even bigger, though it could be conditional on a
CONFIG option so it could be excluded if only FLASH support is required.

(I'm not sure if U-Boot uses SPI for anything else anyway...)

In the mean time, I have reads working at 1.76MiB/s. I need to read the
SPI manual carefully before I let that change into the wild though.
Basically, I've added an assumption that by the time the RX buffer is
ready with the returned bits, the TXer must have completed since they
run to the same clock: so no need to test for TX not full in the loop.

Nick.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-06-18  8:26     ` Nick Thompson
@ 2010-06-18  8:49       ` Wolfgang Wegner
  0 siblings, 0 replies; 13+ messages in thread
From: Wolfgang Wegner @ 2010-06-18  8:49 UTC (permalink / raw)
  To: u-boot

Hi Nick,

On Fri, Jun 18, 2010 at 09:26:59AM +0100, Nick Thompson wrote:
> 
> (I'm not sure if U-Boot uses SPI for anything else anyway...)

it can be useful for booting an FPGA. On some architectures this
is faster than GPIO bit bang. This is of course board dependent,
as implementation of the actual FPGA functions is in the board
code anyways. (Using this in a not-yet-published board.)

Another use could be SPI-controlled displays (framebuffer driver
for displaying a boot loge etc.). Did not have time to implement
this yet, unfortunately... ;-)

Best regards,
Wolfgang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-05-21 13:15 ` [U-Boot] [PATCH] DaVinci: " Ben Gardiner
@ 2010-05-21 13:18   ` Ben Gardiner
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Gardiner @ 2010-05-21 13:18 UTC (permalink / raw)
  To: u-boot

On Fri, May 21, 2010 at 9:15 AM, Ben Gardiner
<bengardiner@nanometrics.ca> wrote:
> Applies cleanly and compiles with da850evm_config on u-boot-omap-l1
> master branch -- 5f16b85 of 2f05e39 . Also running the resulting

Sorry, that should be 5f16b85 of
git://arago-project.org/git/people/sekhar/u-boot-omapl1.git .

-- 
Ben Gardiner
Nanometrics Inc.
+1 (613) 592-6776 x239
http://www.nanometrics.ca

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed.
  2010-05-13 12:57 [U-Boot] [PATCH] " Delio Brignoli
@ 2010-05-21 13:15 ` Ben Gardiner
  2010-05-21 13:18   ` Ben Gardiner
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Gardiner @ 2010-05-21 13:15 UTC (permalink / raw)
  To: u-boot

On Thu, May 13, 2010 at 8:57 AM, Delio Brignoli
<dbrignoli@audioscience.com> wrote:
> Reduce the number of reads per byte transferred on the BUF register from 2 to 1 and
> take advantage of the TX buffer in the SPI module. On LogicPD OMAP-L138 EVM,
> SPI read throughput goes up from ~0.8Mbyte/s to ~1.3Mbyte/s. Tested with a 2Mbyte image file.
>
> Signed-off-by: Delio Brignoli <dbrignoli@audioscience.com>

Applies cleanly on u-boot master branch -- 2f05e39 of
git://git.denx.de/u-boot.git

Applies cleanly and compiles with da850evm_config on u-boot-omap-l1
master branch -- 5f16b85 of 2f05e39 . Also running the resulting
u-boot.bin did show a noticeable improvement in kernel load time.

Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>

-- 
Ben Gardiner
Nanometrics Inc.
+1 (613) 592-6776 x239
http://www.nanometrics.ca

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-06-18  8:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-01 11:36 [U-Boot] [PATCH] DaVinci: Improve DaVinci SPI speed Delio Brignoli
2010-06-01 13:51 ` Ben Gardiner
2010-06-07 19:30 ` Paulraj, Sandeep
2010-06-07 21:17   ` Paulraj, Sandeep
2010-06-08  8:36     ` Delio Brignoli
2010-06-17 15:02 ` Nick Thompson
2010-06-17 15:10   ` Paulraj, Sandeep
2010-06-17 16:14     ` Nick Thompson
2010-06-17 17:38   ` Delio Brignoli
2010-06-18  8:26     ` Nick Thompson
2010-06-18  8:49       ` Wolfgang Wegner
  -- strict thread matches above, loose matches on Subject: below --
2010-05-13 12:57 [U-Boot] [PATCH] " Delio Brignoli
2010-05-21 13:15 ` [U-Boot] [PATCH] DaVinci: " Ben Gardiner
2010-05-21 13:18   ` Ben Gardiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.