From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Sperl Subject: Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver Date: Wed, 13 Nov 2013 19:35:27 +0100 Message-ID: References: <86AE15B6-05AF-4EFF-8B8F-10806A7C148B@sperl.org> <20131108161957.GP2493@sirena.org.uk> <5F70E708-89B9-4DCF-A31A-E688BAA0E062@sperl.org> <20131108180934.GQ2493@sirena.org.uk> <20131109183056.GU2493@sirena.org.uk> <6C7903B3-8563-490E-AD7D-BA5D65FFB9BC@sperl.org> <20131112011954.GH2674@sirena.org.uk> <52823E73.503@sperl.org> <2252E63E-176C-43F7-B259-D1C3A142DAFE@sperl.org> <20131113154346.GT878@sirena.org.uk> Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Mark Brown Return-path: In-Reply-To: <20131113154346.GT878-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> Sender: linux-spi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Hi Mark >> As for other interesting measurements a single example with 5 transfers: >> Interrupt to __spi_async: 19us >> __spi_async sanity start/end: 2us >> __SPI_ASYNC to DMA_PREPARE: 99us >> dma_prepare start/end: 40us >> dma_prepare_end to CS DOWN: 4us >> CS DOWN to CS UP: 16us (real transfer) > > This is making me question the use of DMA at all here, this looks like > the situation of a lot of drivers where they switch to PIO mode for > small transfers since the cost of managing DMA is too great. I'm also > curious which parts of the DMA preparation are expensive - is it the > building of the datastructure for DMA or is it dealing with the > coherency issues for the DMA controller? The dmaengine API currently > needs transfers rebuilding each time I believe... > > Also how does this scale for larger messages? > > I appreciate that you want to push the entire queue down into hardware, > I'm partly thinking of the costs for drivers that don't go and do any of > the precooking here. Well - if you look at the above example: it takes 99us to get from SPI_ASYNC (post-message check) to dma_prepare inside transfer_one_message. And this is the time spent in the framework AND scheduler! The DMA prepare takes 40us you still can run the DMA Prepare in the same time and be faster... 8OK, I did not account for the "teardown" time, but that should be faster, as it just walks the list and returns it to the dmapool. Also keep in mind that this message is (as said) actually comprised of 5 spi_transfers in a chain with two of those having set CHANGE_CS, so it is already a "bigger" spi_message than you would typically see. A write_then_read (2 transfers) has a "setup time of about 23us. But as this transfer happens during "setup" the dmapool may not have any pages allocated yet, which increases the overhead for allocation and thus adds a bias to this measurement. You also see that this driver is already trying to keep the latencies as short as possible by using the chaining instead of issuing each of those CS-enable sequence separately. So in the end we can give a rough estimate that it takes 10us/transfer to process - so that means we are getting the same thru-put delay for DMA (prepared) versus interrupt-driven (Polling being the worsted) for spi_messages which are <=10 transfers. Which is already a pretty big spi message... But then you should not forget that whith the Interrupt driven approach, you have delays between each transfer due to interrupts, when you reconfigure the data registers. And I am 100% certain that you will not be able to achieve a 1us delay between 2 transfers (of arbitrary size) in interrupt driven mode. This is again CPU cycles (which I am not even sure are correctly accounted for in sys_cpu). And then on top of that - at least the spi-bcm2835 interrupt makes a wakeup call to the workqueue whenever it finishes a single transfer (and again the scheduler gets involved...). That is also why I have reported a high Interrupts and context-switch rate for this driver. So you think this is really cheaper from the CPU perspective to take the Interrupt-driven approach? (note also that I have set the spi_queue to run with RT-priority, so the spi_pump has a huge advantage getting CPU... > > Exactly, but this is largely orthogonal to having the client drivers > precook the messages. You can't just discard the thread since some > things like clock reprogramming can be sleeping but we should be using > it less (and some of the use that is needed can be run in parallel with > the transfers of other messages if we build up a queue). Ok - the clock setting is possibly valid for some SPI devices, but not for all. And we all know that a "one-size fits all" does not scale in performance. Also it may be a possibility that different devices have different feature-sets, where in part we need to drop back to something else (thread). At some point I had been playing with the idea of doing just that - having a spi_pump to handle things like delay and to work around stupid "bugs"... but still trying to do as much as possible within the DMA I found something that solves everything that we have as abilities in the API right now. But I have to admit, that it required a lot of effort coming up with something that works (and it also produces a lot more code) and also it uncovers a lot more HW issues than expected (I found 4 so far)... > The 40us is definitely somewhat interesting though I'd be interested to > know how that compares with PIO too. Some basic facts ONLY about the SPI transfers themselves: CS down to last CS up for a message of 5 71us transfer of the 5 transfer message with PIPO 16us for DMA. So that is ONLY measured on the SPI Bus! The time lost here is between spi_transfers, which is typically between 8-12us. But also a lot of time is lot of time is lost between the last clock and CS up for PIPO: 19us So the way that the driver is written, its interrupt gets called when it is finished with the transfer (or if the FIFO buffer needs to get filled) and in case of no more data, it will wake up the message pump. Which is in this case surprisingly fast which schedules the message. You see where PIPO looses its time? That is also the reason why i want to move back to the "transmit" interface and see how much this improves the driver. Ciao, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html