linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: PXA270 SSP DMA Corruption
@ 2008-11-12 23:14 Ned Forrester
  0 siblings, 0 replies; 8+ messages in thread
From: Ned Forrester @ 2008-11-12 23:14 UTC (permalink / raw)
  To: spi-devel; +Cc: J. Scott Merritt

missed spi-devel on distribution...

-------- Original Message --------
Subject: Re: PXA270 SSP DMA Corruption
Date: Wed, 12 Nov 2008 18:10:01 -0500
From: Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org>
Organization: Woods Hole Oceanographic Institution
To: J. Scott Merritt <merrij3-IL7dBOYR4Vg@public.gmane.org>
CC: linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW@public.gmane.org,
stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR@public.gmane.org, David Brownell <david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org>
References: <20080211174339.73ca7ed5.merrij3-IL7dBOYR4Vg@public.gmane.org>
<47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> <20081107184819.54baa679.merrij3-IL7dBOYR4Vg@public.gmane.org>

J. Scott Merritt wrote:
> Long ago, I posted a message on linux-arm-kernel describing data
> corruption that I was seeing on PXA270 SSP transfers using DMA:
> http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2
> 
> I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL
> interface provided by spidev ... and unfortunately am still seeing
> data corruption with DMA transfers.
> 
> I have managed to reproduce the problem using the internal loopback
> facility of the PXA270 SSP hardware so it should be possible to
> test this problem in other environments.  On my system, the program
> below reports an error within 30-50 attempts.  From the data reported,
> it would appear the the DMA controller (or memory cache) is transferring
> the contents of the previous buffer rather than the current buffer.
> 
> I have also included the platform data that initializes the SSP
> device drivers.  Perhaps I have misconfigured or misued this in some
> way ???  Note that simply disabling DMA in the platform data allows
> the test to run indefinitely without errors.

[snip code]

Sorry for the delayed reply.

I have looked over your code, and tried to familiarize myself with
spidev.  I cannot easily test your code because I am still using 2.6.20,
and spidev is was not introduced until 2.6.22.  I don't see any obvious
problems with either your test code or spidev.  Certainly, spidev is of
similar complexity to pxa2xx_spi, so I would say that they are similarly
likely to have issues.  I don't know how much use spidev has gotten, but
it is probably more than pxa2xx_spi, because it works across platforms.

There are a couple of things to consider:

1. spidev ought to allocate "buffer" using the __GFP_DMA flag.  I think
you already showed that this is not causing your problem.  I think that
on ARM, all memory can be used for DMA; that is not true on all
architectures and some apparently will silently allocate bounce buffers
for non-dma-accessible memory.

2. spidev uses the same buffer for tx and rx.  That is supposed to be
allowed, but I don't think pxa2xx_spi handles this case correctly.
pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it
uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx
buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers
are the same.  Thus if they are the same, the memory is mapped twice.
"Linux Device Drivers", Corbet, et al. does not address this
possibility, but I bet it is not a good thing to do.

If you are willing, it looks simple to modify spidev to use two buffers,
and test whether that works better.  I agree that spidev should work
either way, but this would be a quick test.  If that works, I will
submit a patch for pxa2xx_spi to fix the case of same buffers.

Alternatively, you might prefer to try fixing map_dma_buffers() and
unmap_dma_buffers() in pxa2xx_spi.c to test this theory.  If it works,
that would be the proper fix, anyway.  Basically it needs to trap the
case of same address (or even overlapping ranges) and do one
dma_map_single using the DMA_BIDIRECTIONAL flag.  Unmapping has to use
the same flag.  I think I would choose to make it fail on overlapped but
unequal ranges, and perform correctly for the cases of separate or equal.

--

A couple of things that might improve your performance, but probably are
not the cause of the problem:

I notice that you don't set the bits_per_word in struct pxa2xx_spi_chip.
The default is 8.  Presumably that is what you want.

There is a transfer timeout that is built into pxa2xx_spi.c and that is
loaded to the timeout register in the SSP device.  It defaults to "1000"
arbitrary units (actually counts of an unspecified-by-Intel clock that
is measured to be 99.5MHz on the PXA255 NSSP port, but I don't know what
it is on the PXA270).  This is a very short time of 10usec, and the
result is extra interrupts to service during a transfer.  You might try
setting this value to 1,000,000 (10msec), or whatever makes sense in
your application.  You do this by changing the value of the
struct pxa2xx_spi_chip.timeout parameter.  You have the default value of
1000 in that location.

The values of 12 and 4 for the thresholds are sub-optimal.  They should
probably be 8 and 8 for DMA.  A patch has already been submitted by
Vernon Sauder (it will appear in 2.6.28) to fix the documentation, and
variously improve the defaults for unspecified values in struct
pxa2xx_spi_chip.

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PXA270 SSP DMA Corruption
       [not found]               ` <20081113120134.70d533c8.merrij3-IL7dBOYR4Vg@public.gmane.org>
@ 2008-11-13 18:54                 ` Ned Forrester
  0 siblings, 0 replies; 8+ messages in thread
From: Ned Forrester @ 2008-11-13 18:54 UTC (permalink / raw)
  To: J. Scott Merritt; +Cc: spi-devel

J. Scott Merritt wrote:
> On Wed, 12 Nov 2008 22:19:49 -0500
> Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote:
> 
>> It says "peripheral clock frequency" without ever defining that phrase
>> (at least in the 1/2006 version of the developer's manual).  Oddly on
>> PXA255 this frequency appears to be runclock/4 = 99.5MHz for a 400MHz
>> machine.  I bet it is not as high as 312MHz, but you can measure it by
>> setting very long times, forcing the timeout (no tx data in pio mode, I
>> think), and perhaps using GPIO probes to trigger a scope.  I did
>> something like that on my system.  It would sure be nice to know the
>> answer to that, as no one using PXA270 has ever answered the question.
>>
>>> Table 8-4 in my PXA manual is titled "TFT and RFT values with possible
>>> DMA Burst Sizes" ... for 8 bit data, it says ... for TFT > 7
>>> "Do not use DMA" ...  Am I misinterpreting this ?  I -think- that means
>>> that 8/8 is not a good choice ...
> 
> My manual seemed to define "peripheral clock" in multiple places with
> multiple (conflicting) frequencies (or options) ;)

That sounds about like what I found.

> I regret that I am probably not able to attempt the clock measurement
> effort at this time - too many other pressures .... If you are really
> anxious to resolve this, I might be able to loan you a PXA270 system
> for a couple of weeks.  Alternatively, I might be able to find time
> for a couple of Kernel builds/loads if you wanted to build all of the
> test programs and patches ...

Excessive busyness affects all of us.  I must decline.

>> The table you refer to lists the values loaded into the TFT and RFT bit
>> fields, which are the desired threshold-1.  The values passed to
>> pxa2xx_spi are the desired thresholds, so TFT>7 is the same as threshold>8.
> 
> You are correct ... of course ... :)
> 
>>>> With respect to the pxa2xx_spi patch, please proceed - I will certainly
>>> not attempt to generate one myself.
>> I'll get something out in the next couple of days.
> 
> Any chance we could get it accepted as a repair into 2.6.27 ?

Yes, I plan to submit the patch to the stable tree, also.  It usually
takes a few weeks to propagate.

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PXA270 SSP DMA Corruption
       [not found]         ` <20081112213403.402948b9.merrij3-IL7dBOYR4Vg@public.gmane.org>
@ 2008-11-13  3:19           ` Ned Forrester
       [not found]             ` <20081113120134.70d533c8.merrij3@rpi.edu>
  0 siblings, 1 reply; 8+ messages in thread
From: Ned Forrester @ 2008-11-13  3:19 UTC (permalink / raw)
  To: J. Scott Merritt; +Cc: spi-devel

J. Scott Merritt wrote:
> On Wed, 12 Nov 2008 18:10:01 -0500
> It appears that the timeout is computed based upon the "Peripheral Clock
> frequency" on the PXA270 - which would appear to be 312Mhz, (or something
> divided down from there).  If it is 312Mhz, then for my SSP speed of
> 300K, I guess I need something around 10,000.

It says "peripheral clock frequency" without ever defining that phrase
(at least in the 1/2006 version of the developer's manual).  Oddly on
PXA255 this frequency appears to be runclock/4 = 99.5MHz for a 400MHz
machine.  I bet it is not as high as 312MHz, but you can measure it by
setting very long times, forcing the timeout (no tx data in pio mode, I
think), and perhaps using GPIO probes to trigger a scope.  I did
something like that on my system.  It would sure be nice to know the
answer to that, as no one using PXA270 has ever answered the question.

> Table 8-4 in my PXA manual is titled "TFT and RFT values with possible
> DMA Burst Sizes" ... for 8 bit data, it says ... for TFT > 7
> "Do not use DMA" ...  Am I misinterpreting this ?  I -think- that means
> that 8/8 is not a good choice ...

The table you refer to lists the values loaded into the TFT and RFT bit
fields, which are the desired threshold-1.  The values passed to
pxa2xx_spi are the desired thresholds, so TFT>7 is the same as threshold>8.

> With respect to the pxa2xx_spi patch, please proceed - I will certainly
> not attempt to generate one myself.

I'll get something out in the next couple of days.

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PXA270 SSP DMA Corruption
       [not found]               ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org>
@ 2008-11-13  1:59                 ` Ned Forrester
  0 siblings, 0 replies; 8+ messages in thread
From: Ned Forrester @ 2008-11-13  1:59 UTC (permalink / raw)
  To: J. Scott Merritt
  Cc: david-b-yBeKhBN/0LDR7s880joybQ,
	spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW,
	stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR

Ned Forrester wrote:
> J. Scott Merritt wrote:
>> On Wed, 12 Nov 2008 18:10:01 -0500
>> Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote:
>>
>>> 2. spidev uses the same buffer for tx and rx.  That is supposed to be
>>> allowed, but I don't think pxa2xx_spi handles this case correctly.
>>> pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it
>>> uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx
>>> buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers
>>> are the same.  Thus if they are the same, the memory is mapped twice.
>>> "Linux Device Drivers", Corbet, et al. does not address this
>>> possibility, but I bet it is not a good thing to do.
>>>
>>> If you are willing, it looks simple to modify spidev to use two buffers,
>>> and test whether that works better.  I agree that spidev should work
>>> either way, but this would be a quick test.  If that works, I will
>>> submit a patch for pxa2xx_spi to fix the case of same buffers.
>>>
>>> Alternatively, you might prefer to try fixing map_dma_buffers() and
>>> unmap_dma_buffers() in pxa2xx_spi.c to test this theory.  If it works,
>>> that would be the proper fix, anyway.  Basically it needs to trap the
>>> case of same address (or even overlapping ranges) and do one
>>> dma_map_single using the DMA_BIDIRECTIONAL flag.  Unmapping has to use
>>> the same flag.  I think I would choose to make it fail on overlapped but
>>> unequal ranges, and perform correctly for the cases of separate or equal.
>>>
>> [snip performance suggestions]
>>
>> Ned,  Thank for your thorough and thoughtful review.
>>
>> It looks like your concerns regarding the duplicate mapping of the
>> overlapping buffers were correct.  I tried both suggestions - namely,
>> using two buffers in spidev, as well as cutting back to a single buffer
>> mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c.  Each of these changes
>> (by themselves) seemed to eliminate the data corruption in my simple
>> test program !
> 
> That's good.  I will write a patch for pxa2xx_spi, unless you would
> rather do it.
> 
>> However, I have some lingering concerns regarding the latter solution.
>> >From what little I have read, it appears the DMA-BIDIRECTIONAL is intended
>> for situations where the transfer direction is not know ... it is not
>> immediately clear (to me) that it also handles our case, which might be
>> better described as DMA_SIMULTANEOUS.  It is possible that using
>> DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding
>> of the buffering and caching that is going on between the two independent
>> DMA channel it is difficult for me to speculate.
> 
> According to "Linux Device Drivers":
> 
> "If data can move in either direction, use DMA_BIDIRECTIONAL."
> 
> and
> 
> "Incidentally, bounce buffers are one reason why it is important to get
> the direction right. DMA_BIDIRECTIONAL bounce buffers are copied both
> before and after the operation, which is often an unnecessary waste of
> CPU cycles."
> 
>>From the general discussion in the book, it is clear that dma mapping
> takes care of issues like flushing cache (for DMA_TO_DEVICE),
> invalidating cache (for DMA_FROM_DEVICE), setting up any I/O memory
> mapping and scatter/gather, etc.  Once stream mapped (which
> dma_map_single() does), the CPU is not allowed to touch the memory, and
> once unmapped, the device cannot touch the memory.  I think it is clear
> that DMA_BIDIRECTIONAL is for exactly the case where the device will do
> some DMA in each direction before giving the memory back to the CPU.

I should have added that the logic of SPI is that for every byte written
DMA tx), there is a byte read (DMA rx).  Because the clocks that read
are generated by the clocks that transmit, the tx word has to be in the
FIFO before the rx word is returned to the FIFO.  Thus the tx and rx DMA
channels will never access the same word at the same time, if the tx and
rx buffers start at the same address.

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PXA270 SSP DMA Corruption
       [not found]           ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org>
@ 2008-11-13  1:48             ` Ned Forrester
       [not found]               ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Ned Forrester @ 2008-11-13  1:48 UTC (permalink / raw)
  To: J. Scott Merritt
  Cc: david-b-yBeKhBN/0LDR7s880joybQ,
	spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW,
	stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR

J. Scott Merritt wrote:
> On Wed, 12 Nov 2008 18:10:01 -0500
> Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote:
> 
>> 2. spidev uses the same buffer for tx and rx.  That is supposed to be
>> allowed, but I don't think pxa2xx_spi handles this case correctly.
>> pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it
>> uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx
>> buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers
>> are the same.  Thus if they are the same, the memory is mapped twice.
>> "Linux Device Drivers", Corbet, et al. does not address this
>> possibility, but I bet it is not a good thing to do.
>>
>> If you are willing, it looks simple to modify spidev to use two buffers,
>> and test whether that works better.  I agree that spidev should work
>> either way, but this would be a quick test.  If that works, I will
>> submit a patch for pxa2xx_spi to fix the case of same buffers.
>>
>> Alternatively, you might prefer to try fixing map_dma_buffers() and
>> unmap_dma_buffers() in pxa2xx_spi.c to test this theory.  If it works,
>> that would be the proper fix, anyway.  Basically it needs to trap the
>> case of same address (or even overlapping ranges) and do one
>> dma_map_single using the DMA_BIDIRECTIONAL flag.  Unmapping has to use
>> the same flag.  I think I would choose to make it fail on overlapped but
>> unequal ranges, and perform correctly for the cases of separate or equal.
>>
> [snip performance suggestions]
> 
> Ned,  Thank for your thorough and thoughtful review.
> 
> It looks like your concerns regarding the duplicate mapping of the
> overlapping buffers were correct.  I tried both suggestions - namely,
> using two buffers in spidev, as well as cutting back to a single buffer
> mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c.  Each of these changes
> (by themselves) seemed to eliminate the data corruption in my simple
> test program !

That's good.  I will write a patch for pxa2xx_spi, unless you would
rather do it.

> However, I have some lingering concerns regarding the latter solution.
>>From what little I have read, it appears the DMA-BIDIRECTIONAL is intended
> for situations where the transfer direction is not know ... it is not
> immediately clear (to me) that it also handles our case, which might be
> better described as DMA_SIMULTANEOUS.  It is possible that using
> DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding
> of the buffering and caching that is going on between the two independent
> DMA channel it is difficult for me to speculate.

According to "Linux Device Drivers":

"If data can move in either direction, use DMA_BIDIRECTIONAL."

and

"Incidentally, bounce buffers are one reason why it is important to get
the direction right. DMA_BIDIRECTIONAL bounce buffers are copied both
before and after the operation, which is often an unnecessary waste of
CPU cycles."

>From the general discussion in the book, it is clear that dma mapping
takes care of issues like flushing cache (for DMA_TO_DEVICE),
invalidating cache (for DMA_FROM_DEVICE), setting up any I/O memory
mapping and scatter/gather, etc.  Once stream mapped (which
dma_map_single() does), the CPU is not allowed to touch the memory, and
once unmapped, the device cannot touch the memory.  I think it is clear
that DMA_BIDIRECTIONAL is for exactly the case where the device will do
some DMA in each direction before giving the memory back to the CPU.

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PXA270 SSP DMA Corruption
       [not found]       ` <491B6249.7070407-/d+BM93fTQY@public.gmane.org>
@ 2008-11-13  1:24         ` J. Scott Merritt
       [not found]           ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: J. Scott Merritt @ 2008-11-13  1:24 UTC (permalink / raw)
  To: Ned Forrester
  Cc: david-b-yBeKhBN/0LDR7s880joybQ,
	spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW,
	stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR

On Wed, 12 Nov 2008 18:10:01 -0500
Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote:

> J. Scott Merritt wrote:
> > Long ago, I posted a message on linux-arm-kernel describing data
> > corruption that I was seeing on PXA270 SSP transfers using DMA:
> > http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2
> > 
> > I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL
> > interface provided by spidev ... and unfortunately am still seeing
> > data corruption with DMA transfers.
> > 
> > I have managed to reproduce the problem using the internal loopback
> > facility of the PXA270 SSP hardware so it should be possible to
> > test this problem in other environments.  On my system, the program
> > below reports an error within 30-50 attempts.  From the data reported,
> > it would appear the the DMA controller (or memory cache) is transferring
> > the contents of the previous buffer rather than the current buffer.
> > 
> > I have also included the platform data that initializes the SSP
> > device drivers.  Perhaps I have misconfigured or misued this in some
> > way ???  Note that simply disabling DMA in the platform data allows
> > the test to run indefinitely without errors.
> 
> [snip code]
> 
> Sorry for the delayed reply.
> 
> I have looked over your code, and tried to familiarize myself with
> spidev.  I cannot easily test your code because I am still using 2.6.20,
> and spidev is was not introduced until 2.6.22.  I don't see any obvious
> problems with either your test code or spidev.  Certainly, spidev is of
> similar complexity to pxa2xx_spi, so I would say that they are similarly
> likely to have issues.  I don't know how much use spidev has gotten, but
> it is probably more than pxa2xx_spi, because it works across platforms.
> 
> There are a couple of things to consider:
> 
> 1. spidev ought to allocate "buffer" using the __GFP_DMA flag.  I think
> you already showed that this is not causing your problem.  I think that
> on ARM, all memory can be used for DMA; that is not true on all
> architectures and some apparently will silently allocate bounce buffers
> for non-dma-accessible memory.
> 
> 2. spidev uses the same buffer for tx and rx.  That is supposed to be
> allowed, but I don't think pxa2xx_spi handles this case correctly.
> pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it
> uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx
> buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers
> are the same.  Thus if they are the same, the memory is mapped twice.
> "Linux Device Drivers", Corbet, et al. does not address this
> possibility, but I bet it is not a good thing to do.
> 
> If you are willing, it looks simple to modify spidev to use two buffers,
> and test whether that works better.  I agree that spidev should work
> either way, but this would be a quick test.  If that works, I will
> submit a patch for pxa2xx_spi to fix the case of same buffers.
> 
> Alternatively, you might prefer to try fixing map_dma_buffers() and
> unmap_dma_buffers() in pxa2xx_spi.c to test this theory.  If it works,
> that would be the proper fix, anyway.  Basically it needs to trap the
> case of same address (or even overlapping ranges) and do one
> dma_map_single using the DMA_BIDIRECTIONAL flag.  Unmapping has to use
> the same flag.  I think I would choose to make it fail on overlapped but
> unequal ranges, and perform correctly for the cases of separate or equal.
> 
[snip performance suggestions]

Ned,  Thank for your thorough and thoughtful review.

It looks like your concerns regarding the duplicate mapping of the
overlapping buffers were correct.  I tried both suggestions - namely,
using two buffers in spidev, as well as cutting back to a single buffer
mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c.  Each of these changes
(by themselves) seemed to eliminate the data corruption in my simple
test program !

However, I have some lingering concerns regarding the latter solution.
>From what little I have read, it appears the DMA-BIDIRECTIONAL is intended
for situations where the transfer direction is not know ... it is not
immediately clear (to me) that it also handles our case, which might be
better described as DMA_SIMULTANEOUS.  It is possible that using
DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding
of the buffering and caching that is going on between the two independent
DMA channel it is difficult for me to speculate.

Many thanks, Scott.













-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: PXA270 SSP DMA Corruption
       [not found]       ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org>
@ 2008-11-07 18:59         ` Ned Forrester
  0 siblings, 0 replies; 8+ messages in thread
From: Ned Forrester @ 2008-11-07 18:59 UTC (permalink / raw)
  To: J. Scott Merritt; +Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f

J. Scott Merritt wrote:
> Long ago, I posted a message on linux-arm-kernel describing data corruption
> that I was seeing on PXA270 SSP transfers using DMA:
> http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2
> 
> I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL
> interface provided by spidev ... and unfortunately am still seeing
> data corruption with DMA transfers.

pxa2xx_spi.c has been pretty well wrung out by various people.  I worked
with Stephen preparing fixes that I think were introduced in 2.6.20.
While my task is receive-only, and it looks like your task is
transmit-only, I did a lot of loopback testing at that time, without the
sort of trouble you report.  Since then Stephen seems to have moved on
to other interests.

> I have a user program that is sending single blocks (1-255 words) of
> fabricated, non-zero data to an outbound processor (at 300 kHz).  I
> find that after a small number of blocks (10-20), the outboard processor
> will receive a packet that begins with multiple zeros, rather than the
> intended data.
> 
> I have put printk statements into spidev that show that the correct,
> non-zero data is always present in the DMA bounce buffer before and
> after the transfer - even on the packets that fail to arrive.

I assume you are still talking about spi_char as the protocol driver.
Right?

You may be having trouble with the way the buffers are allocated.  I
think, but am not sure, that PXA wants the buffers aligned on 32-bit
boundaries.  I note that spi_char statically allocates the buffers in
the middle of the struct spichar_driver.  You might want to allocate the
buffers separately, and change struct spichar_driver to have only
pointers to the separately allocated buffers.  Then....

In my protocol driver, I allocate my buffers with a few more flags:

d_buf = (char *)kzalloc(count, GFP_KERNEL | __GFP_DMA | __GFP_COLD);

I think the important one is probably __GFP_DMA.  I'm not sure how
important that is on PXA, because I think you can DMA from any memory,
but it may enforce needed alignment.  At least alignment checks never
fail when I do this, but perhaps they would not anyway.

Also note that if pxa2xx_spi discovers that the allocated buffers are
not aligned on a 32-bit boundary or if dma_map_single fails, it will
silently perform the transfer in PIO mode.

You can also try doing the dma mapping in the protocol driver, and
passing the mapped address in struct spi_transfer->tx_buf.  David
Brownell pointed out to me once that this is more efficient, because it
allows the kernel to free the cache line associated with the buffer
sooner than if it is left to pxa2xx_spi.  If you do map in the protocol
driver, don't forget to unmap after the transfer is given back from
pxa2xx_spi.

> If I display the DMA flag in the platform data, then the transfers
> continue indefinitely without any problems.  More interestingly, while
> using DMA, if I simply insert a sched_yield (on an otherwise unoccupied
> processor), before each transfer from user land, the problem
> disappears !
> 
> In my original posting last year, I suspected a cache coherency problem.
> However, based on the latest symptoms, I'm wondering if the SSP
> transfer might be initiated before the DMA controller is ready/able
> to provide source data ....

Anything is possible, but I doubt that there is any such problem in
pxa2xx_spi.  Let us know what happens after changing the allocation of
the buffers.

-- 
Ned Forrester                                       nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab                                  508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution          Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* PXA270 SSP DMA Corruption
       [not found]   ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org>
@ 2008-11-07 16:43     ` J. Scott Merritt
       [not found]       ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: J. Scott Merritt @ 2008-11-07 16:43 UTC (permalink / raw)
  To: Ned Forrester
  Cc: David Brownell,
	spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f,
	stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR

Long ago, I posted a message on linux-arm-kernel describing data corruption
that I was seeing on PXA270 SSP transfers using DMA:
http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2

I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL
interface provided by spidev ... and unfortunately am still seeing
data corruption with DMA transfers.

I have a user program that is sending single blocks (1-255 words) of
fabricated, non-zero data to an outbound processor (at 300 kHz).  I
find that after a small number of blocks (10-20), the outboard processor
will receive a packet that begins with multiple zeros, rather than the
intended data.

I have put printk statements into spidev that show that the correct,
non-zero data is always present in the DMA bounce buffer before and
after the transfer - even on the packets that fail to arrive.

If I display the DMA flag in the platform data, then the transfers
continue indefinitely without any problems.  More interestingly, while
using DMA, if I simply insert a sched_yield (on an otherwise unoccupied
processor), before each transfer from user land, the problem
disappears !

In my original posting last year, I suspected a cache coherency problem.
However, based on the latest symptoms, I'm wondering if the SSP
transfer might be initiated before the DMA controller is ready/able
to provide source data ....

Thanks, Scott.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-11-13 18:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-12 23:14 PXA270 SSP DMA Corruption Ned Forrester
  -- strict thread matches above, loose matches on Subject: below --
2008-02-11 22:43 PXA270 SSPSFRM gates chip select ? J. Scott Merritt
2008-02-11 23:26 ` Ned Forrester
     [not found]   ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org>
2008-11-07 16:43     ` PXA270 SSP DMA Corruption J. Scott Merritt
     [not found]       ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org>
2008-11-07 18:59         ` Ned Forrester
     [not found]   ` <20081107184819.54baa679.merrij3@rpi.edu>
     [not found]     ` <491B6249.7070407@whoi.edu>
     [not found]       ` <491B6249.7070407-/d+BM93fTQY@public.gmane.org>
2008-11-13  1:24         ` J. Scott Merritt
     [not found]           ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org>
2008-11-13  1:48             ` Ned Forrester
     [not found]               ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org>
2008-11-13  1:59                 ` Ned Forrester
     [not found]       ` <20081112213403.402948b9.merrij3@rpi.edu>
     [not found]         ` <20081112213403.402948b9.merrij3-IL7dBOYR4Vg@public.gmane.org>
2008-11-13  3:19           ` Ned Forrester
     [not found]             ` <20081113120134.70d533c8.merrij3@rpi.edu>
     [not found]               ` <20081113120134.70d533c8.merrij3-IL7dBOYR4Vg@public.gmane.org>
2008-11-13 18:54                 ` Ned Forrester

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).