* PXA270 SSPSFRM gates chip select ? @ 2008-02-11 22:43 J. Scott Merritt [not found] ` <20080211174339.73ca7ed5.merrij3-IL7dBOYR4Vg@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: J. Scott Merritt @ 2008-02-11 22:43 UTC (permalink / raw) To: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Reposted from linux-arm-kernel mailing list ... at the suggestion of David Brownell. Using pxa2xx_spi from Kernel 2.6.21. PXA270 is SSP Master in SPI_MODE_3 (SPO=1, SPH=1) with internal clock and GPIO's used as external chip selects. On the very first message after boot, I am receiving an extra clock pulse at the slave device (causing the whole message to be "shifted"). It appears that on the first message, the Chip Select is activated before the SPO=1/SPH=1 is fully established in the PXA's SSP hardware, resulting in an extra, positive-going transition of SSPSCLK as it attempts to establish the proper (high) clock level for the SPH=1 setting. Testing was performed on Kernal 2.6.21, but it appears that 2.6.24 also performs the chip select call before updating SSCR0 & SSCR1. Slave device is NXP LPC2366. I have tried setting the "default" in pxa2xx_spi.c to SPO=1, SPH=1; but still have the clock riding low before the first chip select. ... David Brownell responds: Actually, the SPI_CPOL mode bit controls the clock polarity before the chip select coes active: CPOL=0 should mean it's forced low (if it isn't already low). So if that driver is doing chipselect *then* clock setup, it's doing the wrong thing. A patch would be appreciated... ... Thanks, Scott. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20080211174339.73ca7ed5.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <20080211174339.73ca7ed5.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-02-11 22:54 ` Zik Saleeba [not found] ` <33e9dd1c0802111454k5deeaa38o9d21cee610b79da7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-11 23:26 ` Ned Forrester 1 sibling, 1 reply; 23+ messages in thread From: Zik Saleeba @ 2008-02-11 22:54 UTC (permalink / raw) To: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Cc: stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR FWIW I'm using the pxa2xx_spi driver from 2.6.23.9 on a pxa270 and I'm not experiencing any problems of this type. One thought - some manufacturer's interpretations of SPI invert the sense of the clock so maybe the problem is related to the chip he's talking to rather than the pxa driver. While we're on the subject of the pxa2xx_spi driver, I've found that it has incredibly slow throughput. The problem appears to be related to the use of a tasklet in pumping messages. The initial setup and transfer is quick but then the tasklet is started and a context switch causes terrible latency. It takes around a millisecond for the tasklet to get control at which point the chip select is released and the SPI bus is available for the next transfer. This means that running as hard as it can the bus is idle 99% of the time. I've experimented with removing the tasklet from the driver and it does improve the situation markedly. Cheers, Zik On Feb 12, 2008 9:43 AM, J. Scott Merritt <merrij3-IL7dBOYR4Vg@public.gmane.org> wrote: > Reposted from linux-arm-kernel mailing list ... at the suggestion > of David Brownell. > > Using pxa2xx_spi from Kernel 2.6.21. PXA270 is SSP Master in > SPI_MODE_3 (SPO=1, SPH=1) with internal clock and GPIO's used as > external chip selects. > > On the very first message after boot, I am receiving an extra clock > pulse at the slave device (causing the whole message to be "shifted"). > > It appears that on the first message, the Chip Select is activated > before the SPO=1/SPH=1 is fully established in the PXA's SSP hardware, > resulting in an extra, positive-going transition of SSPSCLK as it > attempts to establish the proper (high) clock level for the SPH=1 > setting. > > Testing was performed on Kernal 2.6.21, but it appears that 2.6.24 > also performs the chip select call before updating SSCR0 & SSCR1. > Slave device is NXP LPC2366. > > I have tried setting the "default" in pxa2xx_spi.c to SPO=1, SPH=1; > but still have the clock riding low before the first chip select. > > ... David Brownell responds: > > Actually, the SPI_CPOL mode bit controls the clock polarity > before the chip select coes active: CPOL=0 should mean it's > forced low (if it isn't already low). > > So if that driver is doing chipselect *then* clock setup, > it's doing the wrong thing. A patch would be appreciated... > > ... > Thanks, Scott. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > spi-devel-general mailing list > spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org > https://lists.sourceforge.net/lists/listinfo/spi-devel-general > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <33e9dd1c0802111454k5deeaa38o9d21cee610b79da7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <33e9dd1c0802111454k5deeaa38o9d21cee610b79da7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-02-12 2:51 ` David Brownell [not found] ` <200802111851.10155.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: David Brownell @ 2008-02-12 2:51 UTC (permalink / raw) To: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Cc: stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Monday 11 February 2008, Zik Saleeba wrote: > While we're on the subject of the pxa2xx_spi driver, I've found that > it has incredibly slow throughput. The problem appears to be related to > the use of a tasklet in pumping messages. The initial setup and > transfer is quick but then the tasklet is started and a context switch > causes terrible latency. It takes around a millisecond for the tasklet > to get control at which point the chip select is released and the SPI > bus is available for the next transfer. This means that running as > hard as it can the bus is idle 99% of the time. I've experimented with > removing the tasklet from the driver and it does improve the situation > markedly. Got patch? :) - Dave ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <200802111851.10155.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <200802111851.10155.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> @ 2008-02-12 3:15 ` Zik Saleeba [not found] ` <33e9dd1c0802111915q48cb80ecxb33461a9263f9295-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-12 3:24 ` Ned Forrester 1 sibling, 1 reply; 23+ messages in thread From: Zik Saleeba @ 2008-02-12 3:15 UTC (permalink / raw) To: David Brownell Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Feb 12, 2008 1:51 PM, David Brownell <david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> wrote: > > Got patch? :) I'm using a version which I've backported to a slightly earlier kernel so it's not entirely clean. I'll try to sort out a clean patch shortly. Incidentally I've also written a driver for the NXP sc16is752 dual UART (connects via SPI). Is anyone interested in that? Cheers, Zik ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <33e9dd1c0802111915q48cb80ecxb33461a9263f9295-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <33e9dd1c0802111915q48cb80ecxb33461a9263f9295-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-02-12 4:02 ` David Brownell 0 siblings, 0 replies; 23+ messages in thread From: David Brownell @ 2008-02-12 4:02 UTC (permalink / raw) To: Zik Saleeba Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Monday 11 February 2008, Zik Saleeba wrote: > Incidentally I've also written a driver for the NXP sc16is752 dual > UART (connects via SPI). Is anyone interested in that? Rule of thumb: We're always interested in having new Linux drivers. :) Even if nobody else uses it today, tomorrow may be a different story. In this case, while the SPI list is probably the best place to get a review of its SPI capability (first?), I'd suggest that LKML would be the best place to get feedback on how it talks to the serial driver framework. Alan Cox has been doing a bunch of long-overdue cleanup to that framework, so there may be updated "best practices" to know. - Dave p.s. Speaking of drivers for slightly atypical devices ... 2.6.25-rc1 added drivers/net/enc28j60.c which is a 10baseT Ethernet driver, communicating via SPI. A SPI driver for the n9604 USB peripheral chip would be fun too! ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <200802111851.10155.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 2008-02-12 3:15 ` Zik Saleeba @ 2008-02-12 3:24 ` Ned Forrester [not found] ` <47B11178.6090904-/d+BM93fTQY@public.gmane.org> 1 sibling, 1 reply; 23+ messages in thread From: Ned Forrester @ 2008-02-12 3:24 UTC (permalink / raw) To: David Brownell Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR David Brownell wrote: > On Monday 11 February 2008, Zik Saleeba wrote: >> While we're on the subject of the pxa2xx_spi driver, I've found that >> it has incredibly slow throughput. The problem appears to be related to >> the use of a tasklet in pumping messages. The initial setup and >> transfer is quick but then the tasklet is started and a context switch >> causes terrible latency. It takes around a millisecond for the tasklet >> to get control at which point the chip select is released and the SPI >> bus is available for the next transfer. This means that running as >> hard as it can the bus is idle 99% of the time. I've experimented with >> removing the tasklet from the driver and it does improve the situation >> markedly. > > Got patch? :) > > - Dave I thought it was generally considered good practice to separate interrupt service into a top-half that executes only the code required to satisfy the short-term hardware requirements, and a bottom-half that is executed in a tasklet and that executes all data related code that is not required to be executed immediately. This separation allows interrupts to be re-enabled as quickly as possible so that interrupt latency for other parts of the kernel is not impacted any more than necessary. Is this not still the practice in the modern kernel? I assume that "removing the tasklet" means calling pump_transfers() directly from the interrupt service routines, rather than having the ISRs schedule a tasklet to make that call. Right? I have been working with this driver for a couple of years and have not noticed this problem, though I guess in my application it takes 3ms for DMA to load a 4096byte buffer, so I might not have noticed an overall effect. However, during earlier work, 95% of the time I was able to service the DMA interrupt and set up the next DMA (requiring the tasklet to run) within the 366us that would have caused a hardware fifo overrun. That would indicate that tasklet execution was occurring in far less than 1ms most of the time. This is on a 400MHz PXA255. The latency of tasklet service must depend on what else the processor is doing. You might want to look at what else your system is doing, and whether there are any scheduling parameters that might make sense to adjust. I believe tasklets will never run later than the next timer tick, which I believe is 1ms on most moder processors (but which can be changed). Thus 1ms should be the maximum latency; I would expect better than 1ms most of the time. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <47B11178.6090904-/d+BM93fTQY@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <47B11178.6090904-/d+BM93fTQY@public.gmane.org> @ 2008-02-12 3:48 ` Zik Saleeba [not found] ` <33e9dd1c0802111948u2256d0adj8caa478073795d78-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-12 4:20 ` David Brownell 1 sibling, 1 reply; 23+ messages in thread From: Zik Saleeba @ 2008-02-12 3:48 UTC (permalink / raw) To: Ned Forrester Cc: David Brownell, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Feb 12, 2008 2:24 PM, Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: > > I thought it was generally considered good practice... Yes, it probably is good practice. Unfortunately the tasklet seemed to be causing performance issues which made the driver essentially unusable for my application. I'm working with a serial chip which requires large numbers of small SPI transfers (several register reads etc. via SPI on each interrupt). If each of these transfers takes a millisecond it becomes impossible to service even a single fairly slow serial connection. I have to service 8 relatively fast serial ports so I can't put up with 99% SPI unavailability. I'm using an earlier kernel (2.6.16) which I've back-ported the latest SPI code so it's possible that tasklets work better in more recent kernels. Anyone know if that might be true? > I assume that "removing the tasklet" means calling pump_transfers() > directly from the interrupt service routines, rather than having the > ISRs schedule a tasklet to make that call. Right? Not exactly. pump_transfers() is called from pump_messages() and a few other places, all of which run in a workqueue. So it's not called from interrupt context but from a workqueue. > I believe tasklets will never run later than the next timer > tick, which I believe is 1ms on most moder processors (but which can be > changed). Thus 1ms should be the maximum latency; I would expect better > than 1ms most of the time. I seem to be seeing 1ms consistently on a Compulab cm-x270 - or at least I did until I made this change. Cheers, Zik ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <33e9dd1c0802111948u2256d0adj8caa478073795d78-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <33e9dd1c0802111948u2256d0adj8caa478073795d78-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-02-12 4:16 ` David Brownell 2008-02-12 4:19 ` Zik Saleeba 2008-02-12 4:43 ` Ned Forrester 2 siblings, 0 replies; 23+ messages in thread From: David Brownell @ 2008-02-12 4:16 UTC (permalink / raw) To: Zik Saleeba Cc: Ned Forrester, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Monday 11 February 2008, Zik Saleeba wrote: > I'm using an earlier kernel (2.6.16) which I've back-ported the latest > SPI code so it's possible that tasklets work better in more recent > kernels. Anyone know if that might be true? I wouldn't think so ... but the most accurate answer would come from your own testing. 2.6.24+ should run on your cm-x270; can you do that the test on that kernel too? - Dave ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <33e9dd1c0802111948u2256d0adj8caa478073795d78-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-12 4:16 ` David Brownell @ 2008-02-12 4:19 ` Zik Saleeba 2008-02-12 4:43 ` Ned Forrester 2 siblings, 0 replies; 23+ messages in thread From: Zik Saleeba @ 2008-02-12 4:19 UTC (permalink / raw) To: Ned Forrester Cc: David Brownell, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Feb 12, 2008 2:48 PM, Zik Saleeba <zik-fsgeVU6Z5FysTnJN9+BGXg@public.gmane.org> wrote: > > pump_transfers() is called from pump_messages() and a few > other places, all of which run in a workqueue. So it's not called from > interrupt context but from a workqueue. I just had a closer look at this and it's not true in all cases. It can also be invoked from dma_transfer_complete() which is reached in interrupt context. This is probably the case which is improving my performance so much. So my change is bad in that it increases the maximum interrupt latency. It's also the only way I know to get decent SPI utilisation when you're doing large numbers of small transfers (as I am). Is there a better way of doing this? Cheers, Zik ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <33e9dd1c0802111948u2256d0adj8caa478073795d78-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-12 4:16 ` David Brownell 2008-02-12 4:19 ` Zik Saleeba @ 2008-02-12 4:43 ` Ned Forrester [not found] ` <47B12406.9040208-/d+BM93fTQY@public.gmane.org> 2 siblings, 1 reply; 23+ messages in thread From: Ned Forrester @ 2008-02-12 4:43 UTC (permalink / raw) To: Zik Saleeba Cc: David Brownell, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Zik Saleeba wrote: > On Feb 12, 2008 2:24 PM, Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: >> I thought it was generally considered good practice... > > Yes, it probably is good practice. Unfortunately the tasklet seemed to > be causing performance issues which made the driver essentially > unusable for my application. I'm working with a serial chip which > requires large numbers of small SPI transfers (several register reads > etc. via SPI on each interrupt). If each of these transfers takes a > millisecond it becomes impossible to service even a single fairly slow > serial connection. I have to service 8 relatively fast serial ports so > I can't put up with 99% SPI unavailability. I thought you might be doing lots of little transfers; that would explain your need. Is each transfer in a separate message, because you have to interact with every transfer, or can you put several transfers in one message, to be executed without higher level interaction? The latter would be faster because each transfer would be pumped in interrupt context, rather than from a work queue. > I'm using an earlier kernel (2.6.16) which I've back-ported the latest > SPI code so it's possible that tasklets work better in more recent > kernels. Anyone know if that might be true? Well I do know that the version of pxa2xx_spi.c in 2.6.16 is very old. How much did you backport? Did you get Stephen's 12/7/06 patches? Reading your test above, I would guess you did, but wonder what you had to change to get the "latest" to compile in 2.6.16. Does your pxa2xx_spi.c contain the function "set_dma_burst_and_threshold()"? If it does, you have Stephen's 12/7/06 patch. >> I assume that "removing the tasklet" means calling pump_transfers() >> directly from the interrupt service routines, rather than having the >> ISRs schedule a tasklet to make that call. Right? > > Not exactly. pump_transfers() is called from pump_messages() and a few > other places, all of which run in a workqueue. So it's not called from > interrupt context but from a workqueue. Actually, it is scheduled from each of the xx_error_stop() and xx_transfer_complete() routines, which all run in interrupt context. It is also scheduled from pump_messages, which is the only place it is scheduled from a work_queue. So it is scheduled from a work_queue only for each new message, but from interrupt context for each transfer within a message, and at the end of each each message to call giveback(). >> I believe tasklets will never run later than the next timer >> tick, which I believe is 1ms on most modern processors (but which can be >> changed). Thus 1ms should be the maximum latency; I would expect better >> than 1ms most of the time. > > I seem to be seeing 1ms consistently on a Compulab cm-x270 - or at > least I did until I made this change. You might want to check the kernel config parameters for your system and make sure that the timer ticks are at least 1000HZ. I notice that mine is set by default (by Gumstix perhaps) to be a tickless system (dynamic ticks): CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y I don't know if this helps; it is just the way mine is set. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <47B12406.9040208-/d+BM93fTQY@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <47B12406.9040208-/d+BM93fTQY@public.gmane.org> @ 2008-02-12 5:24 ` Zik Saleeba [not found] ` <33e9dd1c0802112124y5ae8dd39ua9078f2b3878a018-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: Zik Saleeba @ 2008-02-12 5:24 UTC (permalink / raw) To: Ned Forrester Cc: David Brownell, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Feb 12, 2008 3:43 PM, Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: > > I thought you might be doing lots of little transfers; that would > explain your need. Is each transfer in a separate message, because you > have to interact with every transfer, or can you put several transfers > in one message, to be executed without higher level interaction? Yes, good idea. Unfortunately the transfers are of the form "how many bytes are waiting" followed by "please send x bytes" so they need to be separate transfers. > Does your pxa2xx_spi.c contain the function > "set_dma_burst_and_threshold()"? If it does, you have Stephen's > 12/7/06 patch. Yes thanks, I have that function. > Actually, it is scheduled from each of the xx_error_stop() and > xx_transfer_complete() routines, which all run in interrupt context. Yep sorry - picked that one up in my followup email as I'm sure you've already seen. > You might want to check the kernel config parameters for your system and > make sure that the timer ticks are at least 1000HZ. I notice that mine > is set by default (by Gumstix perhaps) to be a tickless system (dynamic > ticks): > > CONFIG_TICK_ONESHOT=y > CONFIG_NO_HZ=y > CONFIG_HIGH_RES_TIMERS=y > > I don't know if this helps; it is just the way mine is set. I'll give it a try. Thanks. Cheers, Zik ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <33e9dd1c0802112124y5ae8dd39ua9078f2b3878a018-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <33e9dd1c0802112124y5ae8dd39ua9078f2b3878a018-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-02-12 20:48 ` Zik Saleeba 0 siblings, 0 replies; 23+ messages in thread From: Zik Saleeba @ 2008-02-12 20:48 UTC (permalink / raw) To: Ned Forrester Cc: David Brownell, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Followup on this problem: I did some experimentation using workqueues differently in my serial-via-spi driver and it seems to have fixed the problem with poor throughput in the pxa2xx_spi driver. I was previously using the default workqueue in my serial driver but changed it to use separate workqueues. To my surprise this completely fixed the problem I was having with the pxa2xx_spi driver being slow to respond to dma_transfer_complete(). Presumably the extra load on the default shared workqueue was holding up the service tasklet in pxa2xx_spi. So thanks to all - the problem is resolved at last. For what it's worth I'm still getting a 90us chip select period for only 6us of actual SPI transfer for short requests but I think that's down to the speed of the processor and the driver overhead. Thanks everyone for all the suggestions you've offered in the last couple of days! Cheers, Zik ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <47B11178.6090904-/d+BM93fTQY@public.gmane.org> 2008-02-12 3:48 ` Zik Saleeba @ 2008-02-12 4:20 ` David Brownell 1 sibling, 0 replies; 23+ messages in thread From: David Brownell @ 2008-02-12 4:20 UTC (permalink / raw) To: Ned Forrester Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Monday 11 February 2008, Ned Forrester wrote: > I believe tasklets will never run later than the next timer > tick, which I believe is 1ms on most moder processors (but which can be > changed). Thus 1ms should be the maximum latency; I would expect better > than 1ms most of the time. Actually, on ARM the default HZ is 100, not 1000 ... so the default latency would be 10 msec not 1 msec. Recent kernels have oneshot clockevent support for PXA though, so there are many options. I don't recall whether HZ is settable there, but high-res timers might affect scheduling a bit. - Dave ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <20080211174339.73ca7ed5.merrij3-IL7dBOYR4Vg@public.gmane.org> 2008-02-11 22:54 ` Zik Saleeba @ 2008-02-11 23:26 ` Ned Forrester [not found] ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> [not found] ` <20081107184819.54baa679.merrij3@rpi.edu> 1 sibling, 2 replies; 23+ messages in thread From: Ned Forrester @ 2008-02-11 23:26 UTC (permalink / raw) To: J. Scott Merritt Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR J. Scott Merritt wrote: > Reposted from linux-arm-kernel mailing list ... at the suggestion > of David Brownell. > > Using pxa2xx_spi from Kernel 2.6.21. PXA270 is SSP Master in > SPI_MODE_3 (SPO=1, SPH=1) with internal clock and GPIO's used as > external chip selects. > > On the very first message after boot, I am receiving an extra clock > pulse at the slave device (causing the whole message to be "shifted"). > > It appears that on the first message, the Chip Select is activated > before the SPO=1/SPH=1 is fully established in the PXA's SSP hardware, > resulting in an extra, positive-going transition of SSPSCLK as it > attempts to establish the proper (high) clock level for the SPH=1 > setting. > > Testing was performed on Kernal 2.6.21, but it appears that 2.6.24 > also performs the chip select call before updating SSCR0 & SSCR1. > Slave device is NXP LPC2366. > > I have tried setting the "default" in pxa2xx_spi.c to SPO=1, SPH=1; > but still have the clock riding low before the first chip select. I have worked with this driver extensively, so I took a look. It seems that you are right. The these bits are set at the end of pump_transfers(), based on internal values that are configured in setup(). The SSCR1 register is initialized with default data in probe(), but this default value is not influenced from any other configuration data. SSCR1 contains the interrupt enables and dma service request enables. As such, it is set as the very last task in pump_transfers(), after all other setup, including the call to cs_control(). If I recall correctly, in some modes of operation, it is important that this register not be written when activity is already in progress, and so it is only set if changed at the end of pump_transfers(). It appears that there needs to be a check for changed clock mode in pump_transfers(), prior to setting chip select. It is important to do this carefully, so as not to interfere with on-going operations. I will take a closer look at this. I know that normally each of these SPI transfers is independent, but I worked with Stephen Street (the maintainer) over a year ago to prep this driver for some external master data steaming that I needed, and I know we worked in this area of the driver. It is possible that we messed this up, as I don't use chip select. > ... David Brownell responds: > > Actually, the SPI_CPOL mode bit controls the clock polarity > before the chip select coes active: CPOL=0 should mean it's > forced low (if it isn't already low). > > So if that driver is doing chipselect *then* clock setup, > it's doing the wrong thing. A patch would be appreciated... > > ... > Thanks, Scott. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > spi-devel-general mailing list > spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org > https://lists.sourceforge.net/lists/listinfo/spi-devel-general > > -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org>]
* Re: PXA270 SSPSFRM gates chip select ? [not found] ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> @ 2008-02-12 4:08 ` Ned Forrester 2008-11-07 16:43 ` PXA270 SSP DMA Corruption J. Scott Merritt 2008-11-07 19:00 ` PXA270 SSP DMA Corruption - correction J. Scott Merritt 2 siblings, 0 replies; 23+ messages in thread From: Ned Forrester @ 2008-02-12 4:08 UTC (permalink / raw) To: Ned Forrester Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, J. Scott Merritt, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Ned Forrester wrote: > J. Scott Merritt wrote: >> Reposted from linux-arm-kernel mailing list ... at the suggestion >> of David Brownell. >> >> Using pxa2xx_spi from Kernel 2.6.21. PXA270 is SSP Master in >> SPI_MODE_3 (SPO=1, SPH=1) with internal clock and GPIO's used as >> external chip selects. >> >> On the very first message after boot, I am receiving an extra clock >> pulse at the slave device (causing the whole message to be "shifted"). >> >> It appears that on the first message, the Chip Select is activated >> before the SPO=1/SPH=1 is fully established in the PXA's SSP hardware, >> resulting in an extra, positive-going transition of SSPSCLK as it >> attempts to establish the proper (high) clock level for the SPH=1 >> setting. >> >> Testing was performed on Kernal 2.6.21, but it appears that 2.6.24 >> also performs the chip select call before updating SSCR0 & SSCR1. >> Slave device is NXP LPC2366. >> >> I have tried setting the "default" in pxa2xx_spi.c to SPO=1, SPH=1; >> but still have the clock riding low before the first chip select. > > I have worked with this driver extensively, so I took a look. It seems > that you are right. The these bits are set at the end of > pump_transfers(), based on internal values that are configured in > setup(). The SSCR1 register is initialized with default data in > probe(), but this default value is not influenced from any other > configuration data. > > SSCR1 contains the interrupt enables and dma service request enables. > As such, it is set as the very last task in pump_transfers(), after all > other setup, including the call to cs_control(). If I recall correctly, > in some modes of operation, it is important that this register not be > written when activity is already in progress, and so it is only set if > changed at the end of pump_transfers(). It appears that there needs to > be a check for changed clock mode in pump_transfers(), prior to setting > chip select. It is important to do this carefully, so as not to > interfere with on-going operations. > > I will take a closer look at this. I know that normally each of these > SPI transfers is independent, but I worked with Stephen Street (the > maintainer) over a year ago to prep this driver for some external master > data steaming that I needed, and I know we worked in this area of the > driver. It is possible that we messed this up, as I don't use chip select. I have looked some more, and I see what happened. In the fall of 2006 I worked with Stephen to fix some bugs, and to implement a few changes that would facilitate chaining of transfers and messages when no configuration changes are required between transfers. In the process, we removed the function restore_state(), which was called only from pump_messages() and which stopped the SSP and set up the SSCRx registers. All of the SSCRx changes were reduced to a single, final code segment at the end of pump_transfers, so as to create a maintainable place to manage the "set only if changed" function required by chaining of transfers. However, removing the call in pump_messages() has the side effect of not setting up SSCCRx until *after* any CS change, and thus a configuration change will not occur until CS has been set on the first transfer that uses that config change. These code changes were implemented by a patch released by Stephen on 12/7/06 (I'm not sure which kernel release it first appeared in). I will try to think about a patch tomorrow. I volunteer to be involved in this so that it gets fixed without breaking the other things I need to do. Hopefully Stephen will be available to review any patches. >> ... David Brownell responds: >> >> Actually, the SPI_CPOL mode bit controls the clock polarity >> before the chip select coes active: CPOL=0 should mean it's >> forced low (if it isn't already low). >> >> So if that driver is doing chipselect *then* clock setup, >> it's doing the wrong thing. A patch would be appreciated... >> >> ... >> Thanks, Scott. >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> spi-devel-general mailing list >> spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org >> https://lists.sourceforge.net/lists/listinfo/spi-devel-general >> >> > > -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* PXA270 SSP DMA Corruption [not found] ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> 2008-02-12 4:08 ` Ned Forrester @ 2008-11-07 16:43 ` J. Scott Merritt [not found] ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org> 2008-11-07 19:00 ` PXA270 SSP DMA Corruption - correction J. Scott Merritt 2 siblings, 1 reply; 23+ messages in thread From: J. Scott Merritt @ 2008-11-07 16:43 UTC (permalink / raw) To: Ned Forrester Cc: David Brownell, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Long ago, I posted a message on linux-arm-kernel describing data corruption that I was seeing on PXA270 SSP transfers using DMA: http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2 I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL interface provided by spidev ... and unfortunately am still seeing data corruption with DMA transfers. I have a user program that is sending single blocks (1-255 words) of fabricated, non-zero data to an outbound processor (at 300 kHz). I find that after a small number of blocks (10-20), the outboard processor will receive a packet that begins with multiple zeros, rather than the intended data. I have put printk statements into spidev that show that the correct, non-zero data is always present in the DMA bounce buffer before and after the transfer - even on the packets that fail to arrive. If I display the DMA flag in the platform data, then the transfers continue indefinitely without any problems. More interestingly, while using DMA, if I simply insert a sched_yield (on an otherwise unoccupied processor), before each transfer from user land, the problem disappears ! In my original posting last year, I suspected a cache coherency problem. However, based on the latest symptoms, I'm wondering if the SSP transfer might be initiated before the DMA controller is ready/able to provide source data .... Thanks, Scott. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-11-07 18:59 ` Ned Forrester 0 siblings, 0 replies; 23+ messages in thread From: Ned Forrester @ 2008-11-07 18:59 UTC (permalink / raw) To: J. Scott Merritt; +Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f J. Scott Merritt wrote: > Long ago, I posted a message on linux-arm-kernel describing data corruption > that I was seeing on PXA270 SSP transfers using DMA: > http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2 > > I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL > interface provided by spidev ... and unfortunately am still seeing > data corruption with DMA transfers. pxa2xx_spi.c has been pretty well wrung out by various people. I worked with Stephen preparing fixes that I think were introduced in 2.6.20. While my task is receive-only, and it looks like your task is transmit-only, I did a lot of loopback testing at that time, without the sort of trouble you report. Since then Stephen seems to have moved on to other interests. > I have a user program that is sending single blocks (1-255 words) of > fabricated, non-zero data to an outbound processor (at 300 kHz). I > find that after a small number of blocks (10-20), the outboard processor > will receive a packet that begins with multiple zeros, rather than the > intended data. > > I have put printk statements into spidev that show that the correct, > non-zero data is always present in the DMA bounce buffer before and > after the transfer - even on the packets that fail to arrive. I assume you are still talking about spi_char as the protocol driver. Right? You may be having trouble with the way the buffers are allocated. I think, but am not sure, that PXA wants the buffers aligned on 32-bit boundaries. I note that spi_char statically allocates the buffers in the middle of the struct spichar_driver. You might want to allocate the buffers separately, and change struct spichar_driver to have only pointers to the separately allocated buffers. Then.... In my protocol driver, I allocate my buffers with a few more flags: d_buf = (char *)kzalloc(count, GFP_KERNEL | __GFP_DMA | __GFP_COLD); I think the important one is probably __GFP_DMA. I'm not sure how important that is on PXA, because I think you can DMA from any memory, but it may enforce needed alignment. At least alignment checks never fail when I do this, but perhaps they would not anyway. Also note that if pxa2xx_spi discovers that the allocated buffers are not aligned on a 32-bit boundary or if dma_map_single fails, it will silently perform the transfer in PIO mode. You can also try doing the dma mapping in the protocol driver, and passing the mapped address in struct spi_transfer->tx_buf. David Brownell pointed out to me once that this is more efficient, because it allows the kernel to free the cache line associated with the buffer sooner than if it is left to pxa2xx_spi. If you do map in the protocol driver, don't forget to unmap after the transfer is given back from pxa2xx_spi. > If I display the DMA flag in the platform data, then the transfers > continue indefinitely without any problems. More interestingly, while > using DMA, if I simply insert a sched_yield (on an otherwise unoccupied > processor), before each transfer from user land, the problem > disappears ! > > In my original posting last year, I suspected a cache coherency problem. > However, based on the latest symptoms, I'm wondering if the SSP > transfer might be initiated before the DMA controller is ready/able > to provide source data .... Anything is possible, but I doubt that there is any such problem in pxa2xx_spi. Let us know what happens after changing the allocation of the buffers. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 23+ messages in thread
* PXA270 SSP DMA Corruption - correction [not found] ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> 2008-02-12 4:08 ` Ned Forrester 2008-11-07 16:43 ` PXA270 SSP DMA Corruption J. Scott Merritt @ 2008-11-07 19:00 ` J. Scott Merritt 2 siblings, 0 replies; 23+ messages in thread From: J. Scott Merritt @ 2008-11-07 19:00 UTC (permalink / raw) To: Ned Forrester Cc: David Brownell, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR If I display the DMA flag in the platform data, then the transfers ^^^^^^^ -> disable continue indefinitely without any problems. More interestingly, while using DMA, if I simply insert a sched_yield (on an otherwise unoccupied processor), before each transfer from user land, the problem disappears ! ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20081107184819.54baa679.merrij3@rpi.edu>]
[parent not found: <491B6249.7070407@whoi.edu>]
[parent not found: <491B6249.7070407-/d+BM93fTQY@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <491B6249.7070407-/d+BM93fTQY@public.gmane.org> @ 2008-11-13 1:24 ` J. Scott Merritt [not found] ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: J. Scott Merritt @ 2008-11-13 1:24 UTC (permalink / raw) To: Ned Forrester Cc: david-b-yBeKhBN/0LDR7s880joybQ, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Wed, 12 Nov 2008 18:10:01 -0500 Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: > J. Scott Merritt wrote: > > Long ago, I posted a message on linux-arm-kernel describing data > > corruption that I was seeing on PXA270 SSP transfers using DMA: > > http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2 > > > > I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL > > interface provided by spidev ... and unfortunately am still seeing > > data corruption with DMA transfers. > > > > I have managed to reproduce the problem using the internal loopback > > facility of the PXA270 SSP hardware so it should be possible to > > test this problem in other environments. On my system, the program > > below reports an error within 30-50 attempts. From the data reported, > > it would appear the the DMA controller (or memory cache) is transferring > > the contents of the previous buffer rather than the current buffer. > > > > I have also included the platform data that initializes the SSP > > device drivers. Perhaps I have misconfigured or misued this in some > > way ??? Note that simply disabling DMA in the platform data allows > > the test to run indefinitely without errors. > > [snip code] > > Sorry for the delayed reply. > > I have looked over your code, and tried to familiarize myself with > spidev. I cannot easily test your code because I am still using 2.6.20, > and spidev is was not introduced until 2.6.22. I don't see any obvious > problems with either your test code or spidev. Certainly, spidev is of > similar complexity to pxa2xx_spi, so I would say that they are similarly > likely to have issues. I don't know how much use spidev has gotten, but > it is probably more than pxa2xx_spi, because it works across platforms. > > There are a couple of things to consider: > > 1. spidev ought to allocate "buffer" using the __GFP_DMA flag. I think > you already showed that this is not causing your problem. I think that > on ARM, all memory can be used for DMA; that is not true on all > architectures and some apparently will silently allocate bounce buffers > for non-dma-accessible memory. > > 2. spidev uses the same buffer for tx and rx. That is supposed to be > allowed, but I don't think pxa2xx_spi handles this case correctly. > pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it > uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx > buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers > are the same. Thus if they are the same, the memory is mapped twice. > "Linux Device Drivers", Corbet, et al. does not address this > possibility, but I bet it is not a good thing to do. > > If you are willing, it looks simple to modify spidev to use two buffers, > and test whether that works better. I agree that spidev should work > either way, but this would be a quick test. If that works, I will > submit a patch for pxa2xx_spi to fix the case of same buffers. > > Alternatively, you might prefer to try fixing map_dma_buffers() and > unmap_dma_buffers() in pxa2xx_spi.c to test this theory. If it works, > that would be the proper fix, anyway. Basically it needs to trap the > case of same address (or even overlapping ranges) and do one > dma_map_single using the DMA_BIDIRECTIONAL flag. Unmapping has to use > the same flag. I think I would choose to make it fail on overlapped but > unequal ranges, and perform correctly for the cases of separate or equal. > [snip performance suggestions] Ned, Thank for your thorough and thoughtful review. It looks like your concerns regarding the duplicate mapping of the overlapping buffers were correct. I tried both suggestions - namely, using two buffers in spidev, as well as cutting back to a single buffer mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c. Each of these changes (by themselves) seemed to eliminate the data corruption in my simple test program ! However, I have some lingering concerns regarding the latter solution. >From what little I have read, it appears the DMA-BIDIRECTIONAL is intended for situations where the transfer direction is not know ... it is not immediately clear (to me) that it also handles our case, which might be better described as DMA_SIMULTANEOUS. It is possible that using DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding of the buffering and caching that is going on between the two independent DMA channel it is difficult for me to speculate. Many thanks, Scott. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-11-13 1:48 ` Ned Forrester [not found] ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org> 0 siblings, 1 reply; 23+ messages in thread From: Ned Forrester @ 2008-11-13 1:48 UTC (permalink / raw) To: J. Scott Merritt Cc: david-b-yBeKhBN/0LDR7s880joybQ, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR J. Scott Merritt wrote: > On Wed, 12 Nov 2008 18:10:01 -0500 > Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: > >> 2. spidev uses the same buffer for tx and rx. That is supposed to be >> allowed, but I don't think pxa2xx_spi handles this case correctly. >> pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it >> uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx >> buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers >> are the same. Thus if they are the same, the memory is mapped twice. >> "Linux Device Drivers", Corbet, et al. does not address this >> possibility, but I bet it is not a good thing to do. >> >> If you are willing, it looks simple to modify spidev to use two buffers, >> and test whether that works better. I agree that spidev should work >> either way, but this would be a quick test. If that works, I will >> submit a patch for pxa2xx_spi to fix the case of same buffers. >> >> Alternatively, you might prefer to try fixing map_dma_buffers() and >> unmap_dma_buffers() in pxa2xx_spi.c to test this theory. If it works, >> that would be the proper fix, anyway. Basically it needs to trap the >> case of same address (or even overlapping ranges) and do one >> dma_map_single using the DMA_BIDIRECTIONAL flag. Unmapping has to use >> the same flag. I think I would choose to make it fail on overlapped but >> unequal ranges, and perform correctly for the cases of separate or equal. >> > [snip performance suggestions] > > Ned, Thank for your thorough and thoughtful review. > > It looks like your concerns regarding the duplicate mapping of the > overlapping buffers were correct. I tried both suggestions - namely, > using two buffers in spidev, as well as cutting back to a single buffer > mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c. Each of these changes > (by themselves) seemed to eliminate the data corruption in my simple > test program ! That's good. I will write a patch for pxa2xx_spi, unless you would rather do it. > However, I have some lingering concerns regarding the latter solution. >>From what little I have read, it appears the DMA-BIDIRECTIONAL is intended > for situations where the transfer direction is not know ... it is not > immediately clear (to me) that it also handles our case, which might be > better described as DMA_SIMULTANEOUS. It is possible that using > DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding > of the buffering and caching that is going on between the two independent > DMA channel it is difficult for me to speculate. According to "Linux Device Drivers": "If data can move in either direction, use DMA_BIDIRECTIONAL." and "Incidentally, bounce buffers are one reason why it is important to get the direction right. DMA_BIDIRECTIONAL bounce buffers are copied both before and after the operation, which is often an unnecessary waste of CPU cycles." >From the general discussion in the book, it is clear that dma mapping takes care of issues like flushing cache (for DMA_TO_DEVICE), invalidating cache (for DMA_FROM_DEVICE), setting up any I/O memory mapping and scatter/gather, etc. Once stream mapped (which dma_map_single() does), the CPU is not allowed to touch the memory, and once unmapped, the device cannot touch the memory. I think it is clear that DMA_BIDIRECTIONAL is for exactly the case where the device will do some DMA in each direction before giving the memory back to the CPU. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <491B8783.9050800-/d+BM93fTQY@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org> @ 2008-11-13 1:59 ` Ned Forrester 0 siblings, 0 replies; 23+ messages in thread From: Ned Forrester @ 2008-11-13 1:59 UTC (permalink / raw) To: J. Scott Merritt Cc: david-b-yBeKhBN/0LDR7s880joybQ, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Ned Forrester wrote: > J. Scott Merritt wrote: >> On Wed, 12 Nov 2008 18:10:01 -0500 >> Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: >> >>> 2. spidev uses the same buffer for tx and rx. That is supposed to be >>> allowed, but I don't think pxa2xx_spi handles this case correctly. >>> pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it >>> uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx >>> buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers >>> are the same. Thus if they are the same, the memory is mapped twice. >>> "Linux Device Drivers", Corbet, et al. does not address this >>> possibility, but I bet it is not a good thing to do. >>> >>> If you are willing, it looks simple to modify spidev to use two buffers, >>> and test whether that works better. I agree that spidev should work >>> either way, but this would be a quick test. If that works, I will >>> submit a patch for pxa2xx_spi to fix the case of same buffers. >>> >>> Alternatively, you might prefer to try fixing map_dma_buffers() and >>> unmap_dma_buffers() in pxa2xx_spi.c to test this theory. If it works, >>> that would be the proper fix, anyway. Basically it needs to trap the >>> case of same address (or even overlapping ranges) and do one >>> dma_map_single using the DMA_BIDIRECTIONAL flag. Unmapping has to use >>> the same flag. I think I would choose to make it fail on overlapped but >>> unequal ranges, and perform correctly for the cases of separate or equal. >>> >> [snip performance suggestions] >> >> Ned, Thank for your thorough and thoughtful review. >> >> It looks like your concerns regarding the duplicate mapping of the >> overlapping buffers were correct. I tried both suggestions - namely, >> using two buffers in spidev, as well as cutting back to a single buffer >> mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c. Each of these changes >> (by themselves) seemed to eliminate the data corruption in my simple >> test program ! > > That's good. I will write a patch for pxa2xx_spi, unless you would > rather do it. > >> However, I have some lingering concerns regarding the latter solution. >> >From what little I have read, it appears the DMA-BIDIRECTIONAL is intended >> for situations where the transfer direction is not know ... it is not >> immediately clear (to me) that it also handles our case, which might be >> better described as DMA_SIMULTANEOUS. It is possible that using >> DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding >> of the buffering and caching that is going on between the two independent >> DMA channel it is difficult for me to speculate. > > According to "Linux Device Drivers": > > "If data can move in either direction, use DMA_BIDIRECTIONAL." > > and > > "Incidentally, bounce buffers are one reason why it is important to get > the direction right. DMA_BIDIRECTIONAL bounce buffers are copied both > before and after the operation, which is often an unnecessary waste of > CPU cycles." > >>From the general discussion in the book, it is clear that dma mapping > takes care of issues like flushing cache (for DMA_TO_DEVICE), > invalidating cache (for DMA_FROM_DEVICE), setting up any I/O memory > mapping and scatter/gather, etc. Once stream mapped (which > dma_map_single() does), the CPU is not allowed to touch the memory, and > once unmapped, the device cannot touch the memory. I think it is clear > that DMA_BIDIRECTIONAL is for exactly the case where the device will do > some DMA in each direction before giving the memory back to the CPU. I should have added that the logic of SPI is that for every byte written DMA tx), there is a byte read (DMA rx). Because the clocks that read are generated by the clocks that transmit, the tx word has to be in the FIFO before the rx word is returned to the FIFO. Thus the tx and rx DMA channels will never access the same word at the same time, if the tx and rx buffers start at the same address. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20081112213403.402948b9.merrij3@rpi.edu>]
[parent not found: <20081112213403.402948b9.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <20081112213403.402948b9.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-11-13 3:19 ` Ned Forrester [not found] ` <20081113120134.70d533c8.merrij3@rpi.edu> 0 siblings, 1 reply; 23+ messages in thread From: Ned Forrester @ 2008-11-13 3:19 UTC (permalink / raw) To: J. Scott Merritt; +Cc: spi-devel J. Scott Merritt wrote: > On Wed, 12 Nov 2008 18:10:01 -0500 > It appears that the timeout is computed based upon the "Peripheral Clock > frequency" on the PXA270 - which would appear to be 312Mhz, (or something > divided down from there). If it is 312Mhz, then for my SSP speed of > 300K, I guess I need something around 10,000. It says "peripheral clock frequency" without ever defining that phrase (at least in the 1/2006 version of the developer's manual). Oddly on PXA255 this frequency appears to be runclock/4 = 99.5MHz for a 400MHz machine. I bet it is not as high as 312MHz, but you can measure it by setting very long times, forcing the timeout (no tx data in pio mode, I think), and perhaps using GPIO probes to trigger a scope. I did something like that on my system. It would sure be nice to know the answer to that, as no one using PXA270 has ever answered the question. > Table 8-4 in my PXA manual is titled "TFT and RFT values with possible > DMA Burst Sizes" ... for 8 bit data, it says ... for TFT > 7 > "Do not use DMA" ... Am I misinterpreting this ? I -think- that means > that 8/8 is not a good choice ... The table you refer to lists the values loaded into the TFT and RFT bit fields, which are the desired threshold-1. The values passed to pxa2xx_spi are the desired thresholds, so TFT>7 is the same as threshold>8. > With respect to the pxa2xx_spi patch, please proceed - I will certainly > not attempt to generate one myself. I'll get something out in the next couple of days. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <20081113120134.70d533c8.merrij3@rpi.edu>]
[parent not found: <20081113120134.70d533c8.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <20081113120134.70d533c8.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-11-13 18:54 ` Ned Forrester 0 siblings, 0 replies; 23+ messages in thread From: Ned Forrester @ 2008-11-13 18:54 UTC (permalink / raw) To: J. Scott Merritt; +Cc: spi-devel J. Scott Merritt wrote: > On Wed, 12 Nov 2008 22:19:49 -0500 > Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: > >> It says "peripheral clock frequency" without ever defining that phrase >> (at least in the 1/2006 version of the developer's manual). Oddly on >> PXA255 this frequency appears to be runclock/4 = 99.5MHz for a 400MHz >> machine. I bet it is not as high as 312MHz, but you can measure it by >> setting very long times, forcing the timeout (no tx data in pio mode, I >> think), and perhaps using GPIO probes to trigger a scope. I did >> something like that on my system. It would sure be nice to know the >> answer to that, as no one using PXA270 has ever answered the question. >> >>> Table 8-4 in my PXA manual is titled "TFT and RFT values with possible >>> DMA Burst Sizes" ... for 8 bit data, it says ... for TFT > 7 >>> "Do not use DMA" ... Am I misinterpreting this ? I -think- that means >>> that 8/8 is not a good choice ... > > My manual seemed to define "peripheral clock" in multiple places with > multiple (conflicting) frequencies (or options) ;) That sounds about like what I found. > I regret that I am probably not able to attempt the clock measurement > effort at this time - too many other pressures .... If you are really > anxious to resolve this, I might be able to loan you a PXA270 system > for a couple of weeks. Alternatively, I might be able to find time > for a couple of Kernel builds/loads if you wanted to build all of the > test programs and patches ... Excessive busyness affects all of us. I must decline. >> The table you refer to lists the values loaded into the TFT and RFT bit >> fields, which are the desired threshold-1. The values passed to >> pxa2xx_spi are the desired thresholds, so TFT>7 is the same as threshold>8. > > You are correct ... of course ... :) > >>>> With respect to the pxa2xx_spi patch, please proceed - I will certainly >>> not attempt to generate one myself. >> I'll get something out in the next couple of days. > > Any chance we could get it accepted as a repair into 2.6.27 ? Yes, I plan to submit the patch to the stable tree, also. It usually takes a few weeks to propagate. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2008-11-13 18:54 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-11 22:43 PXA270 SSPSFRM gates chip select ? J. Scott Merritt [not found] ` <20080211174339.73ca7ed5.merrij3-IL7dBOYR4Vg@public.gmane.org> 2008-02-11 22:54 ` Zik Saleeba [not found] ` <33e9dd1c0802111454k5deeaa38o9d21cee610b79da7-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-12 2:51 ` David Brownell [not found] ` <200802111851.10155.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 2008-02-12 3:15 ` Zik Saleeba [not found] ` <33e9dd1c0802111915q48cb80ecxb33461a9263f9295-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-12 4:02 ` David Brownell 2008-02-12 3:24 ` Ned Forrester [not found] ` <47B11178.6090904-/d+BM93fTQY@public.gmane.org> 2008-02-12 3:48 ` Zik Saleeba [not found] ` <33e9dd1c0802111948u2256d0adj8caa478073795d78-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-12 4:16 ` David Brownell 2008-02-12 4:19 ` Zik Saleeba 2008-02-12 4:43 ` Ned Forrester [not found] ` <47B12406.9040208-/d+BM93fTQY@public.gmane.org> 2008-02-12 5:24 ` Zik Saleeba [not found] ` <33e9dd1c0802112124y5ae8dd39ua9078f2b3878a018-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-02-12 20:48 ` Zik Saleeba 2008-02-12 4:20 ` David Brownell 2008-02-11 23:26 ` Ned Forrester [not found] ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> 2008-02-12 4:08 ` Ned Forrester 2008-11-07 16:43 ` PXA270 SSP DMA Corruption J. Scott Merritt [not found] ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org> 2008-11-07 18:59 ` Ned Forrester 2008-11-07 19:00 ` PXA270 SSP DMA Corruption - correction J. Scott Merritt [not found] ` <20081107184819.54baa679.merrij3@rpi.edu> [not found] ` <491B6249.7070407@whoi.edu> [not found] ` <491B6249.7070407-/d+BM93fTQY@public.gmane.org> 2008-11-13 1:24 ` PXA270 SSP DMA Corruption J. Scott Merritt [not found] ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org> 2008-11-13 1:48 ` Ned Forrester [not found] ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org> 2008-11-13 1:59 ` Ned Forrester [not found] ` <20081112213403.402948b9.merrij3@rpi.edu> [not found] ` <20081112213403.402948b9.merrij3-IL7dBOYR4Vg@public.gmane.org> 2008-11-13 3:19 ` Ned Forrester [not found] ` <20081113120134.70d533c8.merrij3@rpi.edu> [not found] ` <20081113120134.70d533c8.merrij3-IL7dBOYR4Vg@public.gmane.org> 2008-11-13 18:54 ` Ned Forrester
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).