From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin Sperl Subject: Re: Depreciated spi_master.transfer and "prepared spi messages" for an optimized pipelined-SPI-DMA-driver Date: Fri, 8 Nov 2013 18:31:37 +0100 Message-ID: <5F70E708-89B9-4DCF-A31A-E688BAA0E062@sperl.org> References: <20131106094854.GF11602@sirena.org.uk> <844EDAEA-3FDC-48D0-B59E-CECC0A83761E@sperl.org> <20131106113219.GJ11602@sirena.org.uk> <20131106162410.GB2674@sirena.org.uk> <3B0EDE3F-3386-4879-8D89-2E4577860073@sperl.org> <20131106232605.GC2674@sirena.org.uk> <72D635F5-4229-4D78-8AA3-1392D5D80127@sperl.org> <20131107203127.GB2493@sirena.org.uk> <86AE15B6-05AF-4EFF-8B8F-10806A7C148B@sperl.org> <20131108161957.GP2493@sirena.org.uk> Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8BIT Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Mark Brown Return-path: In-Reply-To: <20131108161957.GP2493-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> Sender: linux-spi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Hi Mark! On 08.11.2013, at 17:19, Mark Brown wrote: > > I'd want to see strong numbers from a real use case showing that the > complexity of trying to do this was worth it. > I remember having shared all sorts on values in my earlier posts regarding to absolute measurements. * from CPU utilization to receive 3000 CAN messages/s * to latency perspective (interrupt to SPI Message) * to time spent "preparing" a message. If you go back to my mail on the 4th November you will find this: On 04.11.2013, at 18:33, Martin Sperl wrote: > > So here a link of an example showing how much the prepared spi_messages > really can improve the SPI thruput - even without changing anything > (besides) preparing the messages for direct DMA use. > > http://www.raspberrypi.org/phpBB3/viewtopic.php?f=44&t=19489&p=448328#p448328 > The link goes to the RPI forum and includes the measurements plus some images > showing both cases (configured via a module parameter for now) > > Quick summary: > For my test-usecase by just enabling "prepared messages" I have reduced the > time for a "simple" transfer from 390us (without prepare) to 230us (with > prepare) and the driver is still using the threaded message_pump. > > OK - the "default" spi-bcm2835.c driver currently in the mainline takes 245us > for the same thing, but it runs at 130k interrupts/s and 95% System load, > which is quite impractical to do anything else. > So getting this down to 80% System load plus shorter responses is already > quite an improvement. I included a lot more "hard" data in my previous emails - if you want to go back to those... The above link also includes images images taken with the logic analyzer comparing prepared and unprepared code-base (by simply enabling/disabling prepare). It also does not yet make use of "DMA chaining", for which I need to move back to the transfer interface to get the most out of DMA... Is this "strong" enough data for a "real" (worsted) case? And this 3200 messages/s on the can BUS is not really the worsted case, as the message is 8 bytes with extended ID at 500KHz. It is more like the "best" "worsted" case, as such a message would take between 258 and 316us (depending on bit-stuffing). This translates to 3164 to 3875 CAN-messages per second. But If I was just using 11 bit IDs and 0 byte data a single message would be taking 94 to 111us, which translates to between 9009 and 10638 CAN messages/s. And then you can double those values again if you go to 1MHz CAN Bus Speed. Also you can easily "trigger" such a situation, if you: * have a device that sends a message (and its controller is set to resend the message if there is no recipient, which would be typical) * have the CAN controller on the RPI configured for "listen only mode" (that means it does not acknowledge messages, which is a prerequisite) * and you connect those two devices together with no other device. Just start up the "sending" device and you get 80% SYSTEM CPU with the DMA driver (95% with the PIPO driver + packet-loss)! The effect is similar to a packet storm on a network with an older Network card with a driver that can not switch from "interrupt" to "polling" mode... The only difference to normal traffic is that the bus would not be saturated 100% of the time over any measured interval - you would see idle gaps. Need more evidence? If so please tell me what you would like to see... Ciao, Martin P.s: and I can not understand why I can read 1000MHz CAN-bus saturated with 0 byte length messages on a simple 8-bit AVR at 16MHz and 4kb memory... (with the same CAN-controller). And that microcontroller still has the time to write the whole stream to a SD card (with all the non-deterministic latencies that a SD card introduces - but without a filesystem, I have to admit, so similar to "dd of=/dev/sdd") So that is why I am trying to optimize the linux driver as well to get to "better" performance and still have some cycles left for doing the "real" work...-- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html