linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable
@ 2009-04-05  3:32 Mok Keith
       [not found] ` <69f617130904042032o382f5084v4fe21884e2356c77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Mok Keith @ 2009-04-05  3:32 UTC (permalink / raw)
  To: linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW,
	spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

Hi all,

I have encounter a kernel panic, when I saw "pxa2xx-spi pxa2xx-spi.1:
dma_transfer: fifo overrun".
After dig into the code from the kernel panic log, I found that
cur_chip equals to NULL in pump_transfers function.

It is very easy to duplicated on my system running pxa9xx cpu with dma
enable (the spi working fine with pure I/O).
However if some printk is added for debugging, the problem gone.

So I cannot find out why the tasklet_schedule for pump_transfers is
called after giveback function is called without the cur_chip is set
first.

Anyone has any idea ?

Keith

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable
       [not found] ` <69f617130904042032o382f5084v4fe21884e2356c77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-04-05 17:07   ` Ned Forrester
       [not found]     ` <49D8E537.1010307-/d+BM93fTQY@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Ned Forrester @ 2009-04-05 17:07 UTC (permalink / raw)
  To: Mok Keith
  Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW

Mok Keith wrote:
> Hi all,
> 
> I have encounter a kernel panic, when I saw "pxa2xx-spi pxa2xx-spi.1:
> dma_transfer: fifo overrun".
> After dig into the code from the kernel panic log, I found that
> cur_chip equals to NULL in pump_transfers function.
> 
> It is very easy to duplicated on my system running pxa9xx cpu with dma
> enable (the spi working fine with pure I/O).
> However if some printk is added for debugging, the problem gone.
> 
> So I cannot find out why the tasklet_schedule for pump_transfers is
> called after giveback function is called without the cur_chip is set
> first.
> 
> Anyone has any idea ?

Some.  I have worked on this driver a lot, but it has been awhile, so I
might overlook some things.

First, the panic is probably caused by these declarations in
pump_transfers():

	u32 dma_thresh = drv_data->cur_chip->dma_threshold;
	u32 dma_burst = drv_data->cur_chip->dma_burst_size;

and, of course, uses of "chip" after this assignement:

	chip = drv_data->cur_chip;

These assignments are performed without checking the validity of
cur_chip.  That should be OK in the "standard use" of pxa2xx_spi,
because  pump_transfers() is only supposed to be called between calls to
pump_messages(), where cur_chip is set, and calls to giveback() or
start_queue(), where cur_chip is cleared.

By "standard use", I mean use of the SPI bus with Linux as the master
(the pxa processor is generating the SPI clock), and normal SPI
transfers where every bit received matches a bit transmitted.  In this
mode, it is hard to imagine how there would be FIFO overrun errors in
DMA mode, because the clock will stop when the TX buffer is empty, and
there should be a matching RX buffer that is filled by the DMA hardware,
thus keeping the SSP receiver FIFO from filling.  The only way I can
imagine DMA allowing the receiver FIFO to fill, would be if silly values
of burst and threshold were used, but these are set by the driver, so
they should be OK.

Is your application using the SSP in some unusual way that allows the RX
FIFO to overrun?  I am not familiar with any PXA9xx chips.  What clock
speed are you using.  What timeout setting are you using?  Are you using
power management with suspend/resume?

I have seen FIFO overruns in my application, but I use a heavily
modified version of pxa2xx_spi.c that implements descriptor-fetch DMA,
enables external clocks, and uses read-without-transmit (RWOT) mode, to
collect data from an 11Mbit/sec external master.  Doing these things can
easily overrun the FIFO, but it only happens when I fail to keep filled
the chain of DMA descriptors pointing to empty buffers (and now I have
fixed that, too, so that I can read data continuously, forever).  The
DMA hardware itself never fails to keep up, so I don't see why you would
get overruns in DMA mode.

Are you sure that your transfers are actually operating in DMA mode?
The driver reverts to PIO mode for any transfer that exceeds 8191bytes
in length.  The driver is not yet coded to break long transfers into
shorter segments that are within the length that the DMA hardware can
handle, so it just uses PIO mode for long transfers; this is a known
deficiency that someone might fix in the future.

All that said, in my modified driver, I did change the above
declarations to simple declarations and later checked the validity of
cur_chip before making the assignments.  I don't recall exactly which
circumstance resulted in execution of pump_transfers() without a valid
cur_chip, but it happened with my very non-standard application.  I my
case, I elected to silently return, if cur_chip was not defined, but one
could issue a message, of course.

I would bet that the fundamental cause of your problem is the FIFO
overrun.  With some more information about your setup and use of
pxa2xx_spi, I might be able to provide more clues.  I would hesitate to
simply patch the above assignments without first understanding why
pump_transfers() is being executed out of sequence.

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable
       [not found]     ` <49D8E537.1010307-/d+BM93fTQY@public.gmane.org>
@ 2009-04-06  2:22       ` Mok Keith
       [not found]         ` <69f617130904051922w72810b52v576546c10c069941-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Mok Keith @ 2009-04-06  2:22 UTC (permalink / raw)
  To: Ned Forrester
  Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW

Hi Ned,

> Is your application using the SSP in some unusual way that allows the RX
> FIFO to overrun?  I am not familiar with any PXA9xx chips.  What clock
> speed are you using.  What timeout setting are you using?  Are you using
> power management with suspend/resume?

No, it is not allow RX FIFO overrun.

Here is the settings in arch:
static struct pxa2xx_spi_chip libertas_spi= {
        .tx_threshold           = 7,
        .rx_threshold           = 8,
        .cs_control             = libertas_spi_cs,
        .dma_burst_size         = 8,
        .timeout                = 230,
};

Here is the settings in driver:
        spi->mode = SPI_MODE_0;
        spi->max_speed_hz = 1000000; /* REVISIT max=50MHz */
        spi->bits_per_word = 16;
        ret = spi_setup(spi);

I set the speed to : "spi->max_speed_hz = 1000000;" only, should be very low.
No power management has been enable, it just happen at the very
beginning during firmware download to chip.
I got a hint that if I enlarge the timeout to 1000, panic will not
happen and FIFO overrun
will not happen. But the chip just cannot run after firmware
downloaded. (It is okay in pure PIO mode).

>
> I have seen FIFO overruns in my application, but I use a heavily
> modified version of pxa2xx_spi.c that implements descriptor-fetch DMA,
> enables external clocks, and uses read-without-transmit (RWOT) mode, to
> collect data from an 11Mbit/sec external master.

No modification to pxa2xx_spi.c in my case.

> Are you sure that your transfers are actually operating in DMA mode?
> The driver reverts to PIO mode for any transfer that exceeds 8191bytes
> in length.

I am sure it is in DMA mode since it hangs up during download of
firmware to the chip
which has length around 512 bytes only and is dma aligned. I had
already add printk and
confirm it.


> I would bet that the fundamental cause of your problem is the FIFO
> overrun.  With some more information about your setup and use of
> pxa2xx_spi, I might be able to provide more clues.  I would hesitate to
> simply patch the above assignments without first understanding why
> pump_transfers() is being executed out of sequence.

I agree, it is meaningless to just add a null pointer checking without
knowning the execution sequence that leads to the problem.

Thanks,
Keith

------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable
       [not found]         ` <69f617130904051922w72810b52v576546c10c069941-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2009-04-08 15:19           ` Ned Forrester
  2009-04-11  2:23             ` [spi-devel-general] " Mok Keith
  0 siblings, 1 reply; 5+ messages in thread
From: Ned Forrester @ 2009-04-08 15:19 UTC (permalink / raw)
  To: Mok Keith
  Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW

Sorry for the delayed response.

Mok Keith wrote:
> Hi Ned,
> 
>> Is your application using the SSP in some unusual way that allows the RX
>> FIFO to overrun?  I am not familiar with any PXA9xx chips.  What clock
>> speed are you using.  What timeout setting are you using?  Are you using
>> power management with suspend/resume?
> 
> No, it is not allow RX FIFO overrun.
> 
> Here is the settings in arch:
> static struct pxa2xx_spi_chip libertas_spi= {
>         .tx_threshold           = 7,
>         .rx_threshold           = 8,
>         .cs_control             = libertas_spi_cs,
>         .dma_burst_size         = 8,
>         .timeout                = 230,
> };

Keep in mind that threshold is measured in "registers", of which there
are 16 in the FIFO (at least on PXA2xx devices), regardless of
byte-width, while burst_size is measured in bytes.  So if you are doing
16 bits_per_word (below), then the threshold should be 8 with a matching
burst of 16; these values each equal 1/2 of the FIFO.  Also, matching
the tx and rx thresholds at 8 and 8 makes more sense to me, however...

In DMA mode, the burst is significant, while the thresholds are ignored
and computed from burst in set_dma_burst_and_threshold().  If the
requested burst is more than 1/2 the FIFO for a given bits/word, then it
is reduced accordingly before computing threshold.  If you give it burst
of 8 at 16bits/word, it will compute a matching rx_threshold of 4, and a
tx_threshold of 12 (not to be confused with the actual register values,
which are one less: 3 and 11).

Try setting dma_burst_size to 16.  That will compute tx and rx
thresholds of 8, representing half the FIFO.  That is the way I normally
use the driver.  If there is a bug in the computation of burst and
threshold, then this might change the behavior.

--

You did not show your values for struct pxa2xx_spi_master.  I assume you
have enable_dma = 1, in this structure.

> Here is the settings in driver:
>         spi->mode = SPI_MODE_0;
>         spi->max_speed_hz = 1000000; /* REVISIT max=50MHz */
>         spi->bits_per_word = 16;
>         ret = spi_setup(spi);
> 
> I set the speed to : "spi->max_speed_hz = 1000000;" only, should be very low.
> No power management has been enable, it just happen at the very
> beginning during firmware download to chip.
> I got a hint that if I enlarge the timeout to 1000, panic will not
> happen and FIFO overrun
> will not happen. But the chip just cannot run after firmware
> downloaded. (It is okay in pure PIO mode).

Timeout is an important setting.  It is used to clean up any trailing
bytes at the end of a transfer that were not handled by DMA (due to
transfer length not being divisible by burst-size, or whatever other
cause).  If the timeout is too short, so that the timeout occurs between
words, then spurious interrupts will be fielded and ignored; because you
have 16bits/word at 1MHz, = 16us, this might be happening if you have a
short timeout.  If the timeout is too long, you waste time at the end of
any transfer with trailing bytes.

The difficult issue with the timeout is that it is not specified what
clock is counted within the chip to generate the timeout.   That may
seem strange, but the developer's manuals for the PXA255 and PXA270 say
that the clock used for timing is the "peripheral clock" but never say
what that clock is.  On the PXA255, running at 400MHz, I carefully
measured the clock (using long timeouts) to be 99.5MHz, which is
run-clock/4.  I have no idea what clock is used on a PXA3xx or PXA9xx.
At 99.5MHz, the default timeout setting of 1000 results in a 10usec
timeout, which is shorter than the time between your arriving 16-bit
words.  If you really use a value of 230, and *if* the PXA9xx uses a
similar clock to count from, then your timeout is only 2.3usec.  You
probably want to use a value of at least 10,000.

I have no theory about why timeout interrupts might contribute to
receiver FIFO overruns in DMA mode, however.

>> I would bet that the fundamental cause of your problem is the FIFO
>> overrun.  With some more information about your setup and use of
>> pxa2xx_spi, I might be able to provide more clues.  I would hesitate to
>> simply patch the above assignments without first understanding why
>> pump_transfers() is being executed out of sequence.
> 
> I agree, it is meaningless to just add a null pointer checking without
> knowning the execution sequence that leads to the problem.
> 
> Thanks,
> Keith
> 
> 


-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [spi-devel-general] kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable
  2009-04-08 15:19           ` Ned Forrester
@ 2009-04-11  2:23             ` Mok Keith
  0 siblings, 0 replies; 5+ messages in thread
From: Mok Keith @ 2009-04-11  2:23 UTC (permalink / raw)
  To: Ned Forrester; +Cc: linux-arm-kernel, spi-devel-general

Hi Ned,

Thanks for your detailed description.
I encounter no more dma fifo overflow after enlarge timeout value to
1000, but still get no response from spi device
after firmware is downloaded.
(I can get response using pure I/O, i.e. enable_dma=0 and dma_burst_size=0):

> Try setting dma_burst_size to 16.  That will compute tx and rx
> thresholds of 8, representing half the FIFO.  That is the way I normally
> use the driver.  If there is a bug in the computation of burst and
> threshold, then this might change the behavior.
I tried it before with different combination:
dma_burst_size=16, tx_threshold=7, rx_threshold=8
dma_burst_size=16, tx_threshold=1, rx_threshold=1
dma_burst_size=16, tx_threshold=0, rx_threshold=0
dma_burst_size=8, tx_threshold=0, rx_threshold=0
dma_burst_size=, tx_threshold=1, rx_threshold=1
dma_burst_size=16, tx_threshold=8, rx_threshold=8

All gives same result. (after firmware download to spi device, no response)

> You did not show your values for struct pxa2xx_spi_master.  I assume you
> have enable_dma = 1, in this structure.
Yes I did it.


> On the PXA255, running at 400MHz, I carefully
> measured the clock (using long timeouts) to be 99.5MHz, which is
> run-clock/4.  I have no idea what clock is used on a PXA3xx or PXA9xx.
In PXA9xx manual, it states that timeout equals to value/26MHz.
So I enlarge it to 1000, no more dma fifo overrun now, but still
cannot get spi device work under dma mode.

The PXA9XX manual said that TXFIFO overruns and RXFIFO underruns are
silent errors.
There is no indication of the overrun or underrun condition other than
missing data
at the receiving end of the link. I don't know whether I fall into
this trap or not.

Any clue ?

Keith

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ:        http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-04-11  2:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-05  3:32 kernel-panic on pxa2xx_spi.c on pxa9xx cpu with dma enable Mok Keith
     [not found] ` <69f617130904042032o382f5084v4fe21884e2356c77-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-05 17:07   ` Ned Forrester
     [not found]     ` <49D8E537.1010307-/d+BM93fTQY@public.gmane.org>
2009-04-06  2:22       ` Mok Keith
     [not found]         ` <69f617130904051922w72810b52v576546c10c069941-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-04-08 15:19           ` Ned Forrester
2009-04-11  2:23             ` [spi-devel-general] " Mok Keith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).