linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] atmel_spi: fix hang due to missed interrupt
@ 2008-07-31 17:10 Haavard Skinnemoen
       [not found] ` <1217524213-4027-1-git-send-email-haavard.skinnemoen-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Haavard Skinnemoen @ 2008-07-31 17:10 UTC (permalink / raw)
  To: David Brownell
  Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Gerard Kam,
	Haavard Skinnemoen, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Lars Steubesand

From: Gerard Kam <gerardk5-H+0wwilmMs3R7s880joybQ@public.gmane.org>

For some time my at91sam9260 board with JFFS2 on serial flash (m25p80) would
hang when accessing the serial flash and SPI bus.  Slowing the SPI clock
down to 9 MHz reduced the occurrence of the hang from "always" during boot
to a nuisance level that allowed other SW development to continue.  Finally
had to address this issue when an application stresses the I/O to always
cause a hang.

Hang seems to be caused by a missed SPI interrupt, so that the task ends up
waiting forever after calling spi_sync().  The fix has 2 parts.  First is to
halt the DMA engine before the "current" PDC registers are loaded.  This
ensures that the "next" registers are loaded before the DMA operation takes
off.  The second part of the fix is a kludge that adds a "completion"
interrupt in case the ENDRX interrupt for the last segment of the DMA
chaining operation was missed.

The patch allows the SPI clock for the serial flash to be increased from 9
MHz to 15 MHz (or more?).  No hangs or SPI overruns were encountered.

Signed-off-by: Gerard Kam <gerardk5-H+0wwilmMs3R7s880joybQ@public.gmane.org>

While this patch does indeed improve things, I still see overruns and
CRC errors on my NGW100 board when running the DataFlash at 10 MHz.
However, I think some improvement is better than nothing, so I'm
passing this on for inclusion in 2.6.27.

Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org>
---
 drivers/spi/atmel_spi.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/spi/atmel_spi.c b/drivers/spi/atmel_spi.c
index 0c71656..95190c6 100644
--- a/drivers/spi/atmel_spi.c
+++ b/drivers/spi/atmel_spi.c
@@ -184,7 +184,8 @@ static void atmel_spi_next_xfer(struct spi_master *master,
 {
 	struct atmel_spi	*as = spi_master_get_devdata(master);
 	struct spi_transfer	*xfer;
-	u32			len, remaining, total;
+	u32			len, remaining;
+	u32			ieval;
 	dma_addr_t		tx_dma, rx_dma;
 
 	if (!as->current_transfer)
@@ -197,6 +198,8 @@ static void atmel_spi_next_xfer(struct spi_master *master,
 		xfer = NULL;
 
 	if (xfer) {
+		spi_writel(as, PTCR, SPI_BIT(RXTDIS) | SPI_BIT(TXTDIS));
+
 		len = xfer->len;
 		atmel_spi_next_xfer_data(master, xfer, &tx_dma, &rx_dma, &len);
 		remaining = xfer->len - len;
@@ -234,6 +237,8 @@ static void atmel_spi_next_xfer(struct spi_master *master,
 	as->next_transfer = xfer;
 
 	if (xfer) {
+		u32	total;
+
 		total = len;
 		atmel_spi_next_xfer_data(master, xfer, &tx_dma, &rx_dma, &len);
 		as->next_remaining_bytes = total - len;
@@ -250,9 +255,11 @@ static void atmel_spi_next_xfer(struct spi_master *master,
 			"  next xfer %p: len %u tx %p/%08x rx %p/%08x\n",
 			xfer, xfer->len, xfer->tx_buf, xfer->tx_dma,
 			xfer->rx_buf, xfer->rx_dma);
+		ieval = SPI_BIT(ENDRX) | SPI_BIT(OVRES);
 	} else {
 		spi_writel(as, RNCR, 0);
 		spi_writel(as, TNCR, 0);
+		ieval = SPI_BIT(RXBUFF) | SPI_BIT(ENDRX) | SPI_BIT(OVRES);
 	}
 
 	/* REVISIT: We're waiting for ENDRX before we start the next
@@ -265,7 +272,7 @@ static void atmel_spi_next_xfer(struct spi_master *master,
 	 *
 	 * It should be doable, though. Just not now...
 	 */
-	spi_writel(as, IER, SPI_BIT(ENDRX) | SPI_BIT(OVRES));
+	spi_writel(as, IER, ieval);
 	spi_writel(as, PTCR, SPI_BIT(TXTEN) | SPI_BIT(RXTEN));
 }
 
@@ -396,7 +403,7 @@ atmel_spi_interrupt(int irq, void *dev_id)
 
 		ret = IRQ_HANDLED;
 
-		spi_writel(as, IDR, (SPI_BIT(ENDTX) | SPI_BIT(ENDRX)
+		spi_writel(as, IDR, (SPI_BIT(RXBUFF) | SPI_BIT(ENDRX)
 				     | SPI_BIT(OVRES)));
 
 		/*
@@ -418,7 +425,7 @@ atmel_spi_interrupt(int irq, void *dev_id)
 		if (xfer->delay_usecs)
 			udelay(xfer->delay_usecs);
 
-		dev_warn(master->dev.parent, "fifo overrun (%u/%u remaining)\n",
+		dev_warn(master->dev.parent, "overrun (%u/%u remaining)\n",
 			 spi_readl(as, TCR), spi_readl(as, RCR));
 
 		/*
@@ -442,7 +449,7 @@ atmel_spi_interrupt(int irq, void *dev_id)
 		spi_readl(as, SR);
 
 		atmel_spi_msg_done(master, as, msg, -EIO, 0);
-	} else if (pending & SPI_BIT(ENDRX)) {
+	} else if (pending & (SPI_BIT(RXBUFF) | SPI_BIT(ENDRX))) {
 		ret = IRQ_HANDLED;
 
 		spi_writel(as, IDR, pending);
-- 
1.5.6.3


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] atmel_spi: fix hang due to missed interrupt
       [not found] ` <1217524213-4027-1-git-send-email-haavard.skinnemoen-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org>
@ 2008-08-01 13:49   ` Haavard Skinnemoen
  2008-08-01 20:07     ` Gerard Kam
  0 siblings, 1 reply; 3+ messages in thread
From: Haavard Skinnemoen @ 2008-08-01 13:49 UTC (permalink / raw)
  To: Gerard Kam
  Cc: David Brownell,
	spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Lars Steubesand

Haavard Skinnemoen <haavard.skinnemoen-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org> wrote:
>  		spi_writel(as, RNCR, 0);
>  		spi_writel(as, TNCR, 0);
> +		ieval = SPI_BIT(RXBUFF) | SPI_BIT(ENDRX) | SPI_BIT(OVRES);

Actually, I think the real bug happens right here: Writing RNCR to 0
will clear any pending ENDRX interrupt, so if the transfer is completed
before this, we won't see any interrupt. These writes are also
completely pointless -- RNCR is zeroed automatically after it gets
shifted into RCR. TNCR works the same way.

The RXBUFF interrupt is only cleared by writing a nonzero RCR or RNCR,
so your patch should fix it. But I'm wondering if there may be another
race left to fix: If we queue two transfers, and both of them complete
before we handle the interrupt, I think we only consider one of them to
be complete. If RXBUFF is set, we should complete any "next" transfer
we have queued up as well.

It could be your patch fixes this last case too -- when this happens,
RXBUFF stays set when we return from the interrupt handler, so the
interrupt gets retriggered immediately. We could handle this more
efficiently, but I think it's handled correctly with your patch applied.

I'll see if I can find a way to clean up the somewhat headache-inducing
control flow in this driver, but until then, your patch should
definitely improve things.

As for the overruns, I'm beginning to suspect that the only way to get
rid of those and still maintain a reasonable transfer rate is to use
bounce buffers in faster RAM (e.g. on-chip SRAM).

Haavard

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH] atmel_spi: fix hang due to missed interrupt
  2008-08-01 13:49   ` Haavard Skinnemoen
@ 2008-08-01 20:07     ` Gerard Kam
  0 siblings, 0 replies; 3+ messages in thread
From: Gerard Kam @ 2008-08-01 20:07 UTC (permalink / raw)
  To: 'Haavard Skinnemoen'
  Cc: 'David Brownell',
	spi-devel-general, 'Lars Steubesand',
	linux-kernel

Hi there

> -----Original Message-----
> From: Haavard Skinnemoen [mailto:haavard.skinnemoen@atmel.com]
> Sent: Friday, August 01, 2008 6:50 AM
> 
> Haavard Skinnemoen <haavard.skinnemoen@atmel.com> wrote:
> >  		spi_writel(as, RNCR, 0);
> >  		spi_writel(as, TNCR, 0);
 
> These writes are also completely pointless -- RNCR is zeroed 
> automatically after it gets shifted into RCR.

While looking at the patch yesterday I was thinking the same thing.  Now it
bugs me that this observation didn't occur when I was working on this
problem.  Maybe the code symmetry makes it look "correct".

> Actually, I think the real bug happens right here

You're probably correct.  A race condition that intermittently clears a
pending interrupt fits the observed symptom.

 
> As for the overruns, I'm beginning to suspect that the only way to get
> rid of those and still maintain a reasonable transfer rate is to use
> bounce buffers in faster RAM (e.g. on-chip SRAM).

For my at91sam9260 board, I eliminated one cause of SPI overruns by lowering
the interrupt priorities of the six USARTs (default was 5, changed to 4)
relative to the two SPI controllers (default is 5).  The test I used for
this issue is 'ls -lR' on the flash filesystem.

Regards -- Gerard

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-08-01 20:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-31 17:10 [PATCH] atmel_spi: fix hang due to missed interrupt Haavard Skinnemoen
     [not found] ` <1217524213-4027-1-git-send-email-haavard.skinnemoen-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org>
2008-08-01 13:49   ` Haavard Skinnemoen
2008-08-01 20:07     ` Gerard Kam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).