* [PATCH] added API: spi_message_compile/optimize/hint/... @ 2013-11-19 18:33 Martin Sperl [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Martin Sperl @ 2013-11-19 18:33 UTC (permalink / raw) To: linux-spi-u79uwXL29TY76Z2rM5mHXA; +Cc: Mark Brown Hi! Here the patch for a spi_message_compile interface (as discussed in a separate thread). Naming of the API is subject to discussion - instead of "compile" it could also be "optimize" or "hint" or ... The patch creates: * 3 additional variables in spi_message * 2 additional function pointers in spi_master * spi_message_compile/uncompile using the above * __spi_async modified, so that the "validation" step for the message does not run every time. Here some measurements using my "standard" CAN test: * runs on Raspberry Pi without over-clocking * optimized mcp2515 driver with 3 Message chunks and callbacks running in listen only mode * CAN bus is running at 125kHz (not 500 as before to simulate "less" load - 25% to be exact) * single can device retransmitting single frame repeatedly. (every 2.1s there is a gap of 15ms when no traffic happens - device reboots after watchdog timeout) * spi-bcm2835 stock kernel driver is used with patch to run message pump with realtime priority. * kernel 3.10.16 (foundation based, as the vanilla RPI kernel still misses dmaengine, so it is easier to develop against this kernel) Main metrics reported are taken with "vmstat 10 12" spi-bcm2835 realtime=0, unpatched spi.h, mcp2515 without any "compiles": Messages received: 178523 Messages in second HW buffer: 20 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 411516 14380 26436 0 0 588 19 1340 653 12 31 37 20 0 0 0 411232 14388 26436 0 0 0 4 45880 9412 0 35 61 4 0 0 0 411104 14388 26436 0 0 0 0 45843 9378 0 34 66 0 0 0 0 411040 14388 26436 0 0 0 8 45803 9528 0 41 59 0 0 0 0 410816 14396 26436 0 0 0 6 45703 9394 0 38 53 9 0 0 0 410624 14396 26436 0 0 0 0 45623 9270 0 33 67 0 0 0 0 410560 14396 26436 0 0 0 0 45793 9335 0 34 66 0 0 0 0 410464 14396 26436 0 0 0 3 45786 9367 0 35 65 0 0 0 0 410464 14396 26436 0 0 0 0 45741 9364 0 35 65 0 0 0 0 410496 14396 26436 0 0 0 0 45772 9360 0 35 65 0 0 0 0 410464 14396 26436 0 0 0 0 45731 9309 0 34 66 0 0 0 0 410528 14396 26436 0 0 0 0 45726 9287 0 34 66 0 spi-bcm2835 realtime=1 unpatched spi.h, mcp2515 without any "compiles": Messages received: 178031 Messages in second HW buffer: 4 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 411512 14408 26440 0 0 57 2 7223 1497 1 8 89 2 0 0 0 411384 14416 26440 0 0 0 2 45815 9827 0 34 61 5 0 0 0 411288 14416 26440 0 0 0 0 45778 9822 0 33 67 0 0 0 0 411128 14416 26440 0 0 0 2 45630 9807 0 35 65 0 0 0 0 411064 14424 26440 0 0 0 6 45705 10029 0 40 54 6 0 0 0 411032 14424 26440 0 0 0 0 45628 9807 0 33 67 0 0 0 0 410840 14424 26440 0 0 0 0 45611 9763 0 33 67 0 0 0 0 410744 14424 26440 0 0 0 0 45544 9716 0 33 67 0 0 0 0 410712 14424 26440 0 0 0 0 45574 9756 0 33 67 0 0 0 0 410712 14424 26440 0 0 0 0 45660 9807 0 33 67 0 0 0 0 410712 14424 26440 0 0 0 0 45494 9768 0 33 67 0 0 0 0 410712 14424 26440 0 0 0 0 45549 9778 0 34 66 0 spi-bcm2835 realtime=0 patched spi.h, mcp2515 without any "compiles": Messages received: 178378 Messages in second HW buffer: 6 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 355064 20460 71900 0 0 182 20 8400 1786 15 17 57 11 0 0 0 355180 20468 71900 0 0 0 2 45380 8788 0 33 59 8 0 0 0 355116 20468 71900 0 0 0 0 46032 8863 0 33 67 0 1 0 0 355020 20468 71900 0 0 0 2 45768 8865 0 36 64 0 0 0 0 354924 20476 71900 0 0 0 6 45749 8869 0 39 51 10 0 0 0 354828 20476 71900 0 0 0 0 45452 8747 0 33 67 0 0 0 0 354732 20476 71900 0 0 0 0 45644 8757 0 33 67 0 0 0 0 354540 20476 71900 0 0 0 0 45404 8746 0 33 67 0 0 0 0 354508 20476 71900 0 0 0 0 45678 8739 0 32 68 0 0 0 0 354548 20476 71900 0 0 0 0 45959 8804 0 33 67 0 0 0 0 354516 20476 71900 0 0 0 0 45984 8831 0 33 67 0 0 0 0 354548 20476 71900 0 0 0 0 45925 8835 0 34 66 0 spi-bcm2835 realtime=1 patched spi.h, mcp2515 without any "compiles": Messages received: 177648 Messages in second HW buffer: 6 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 355456 20488 71908 0 0 116 13 11779 2372 10 15 68 7 0 0 0 355296 20496 71908 0 0 0 2 45963 8916 0 35 61 3 0 0 0 355168 20496 71908 0 0 0 0 45879 8913 0 35 65 0 0 0 0 355104 20496 71908 0 0 0 2 45803 8880 0 36 64 0 0 0 0 355008 20504 71908 0 0 0 8 45352 8888 0 41 48 11 0 0 0 354880 20504 71908 0 0 0 0 45424 8846 0 36 64 0 0 0 0 354816 20504 71908 0 0 0 0 45247 8804 0 34 66 0 0 0 0 354720 20504 71908 0 0 0 0 45532 8824 0 35 65 0 0 0 0 354688 20504 71908 0 0 0 0 45553 8826 0 36 64 0 0 0 0 354720 20504 71908 0 0 0 0 45490 8834 0 34 66 0 0 0 0 354688 20504 71908 0 0 0 0 45673 8880 0 35 65 0 0 0 0 354688 20504 71908 0 0 0 0 45041 8674 0 34 66 0 spi-bcm2835 realtime=0 patched spi.h, mcp2515 with "compile": Messages received: 178493 Messages in second HW buffer: 38 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 353868 20792 73116 0 0 74 9 15549 3084 8 16 71 5 0 0 0 353804 20800 73116 0 0 0 2 45717 9051 0 32 63 5 0 0 0 353676 20800 73116 0 0 0 0 45844 9044 0 33 67 0 0 0 0 353620 20800 73116 0 0 0 1 45829 9120 0 36 64 0 0 0 0 353588 20804 73116 0 0 0 2 45885 9164 0 37 58 5 0 0 0 353460 20804 73116 0 0 0 0 45737 9070 0 32 68 0 0 0 0 353364 20804 73116 0 0 0 0 45752 9056 0 31 69 0 0 0 0 353204 20804 73116 0 0 0 0 45847 9078 0 33 67 0 0 0 0 353140 20804 73116 0 0 0 0 45629 9035 0 33 67 0 0 0 0 353108 20804 73116 0 0 0 0 45677 9052 0 32 68 0 0 0 0 353140 20804 73116 0 0 0 0 45692 9025 0 33 67 0 0 0 0 353172 20804 73116 0 0 0 0 45687 9023 0 33 67 0 spi-bcm2835 realtime=1 patched spi.h, mcp2515 with "compile": Messages received: 177623 Messages in second HW buffer: 3 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 353964 20760 73112 0 0 87 10 13506 2687 9 15 70 6 0 0 0 353836 20768 73112 0 0 0 2 45629 8960 0 34 63 4 0 0 0 353740 20768 73112 0 0 0 0 45360 8920 0 33 67 0 0 0 0 353676 20768 73112 0 0 0 2 45660 9033 0 35 65 0 0 0 0 353580 20776 73112 0 0 0 6 45714 9007 0 36 57 6 0 0 0 353452 20776 73112 0 0 0 0 45489 8897 0 32 68 0 0 0 0 353292 20776 73112 0 0 0 0 45615 8976 0 33 67 0 0 0 0 353164 20776 73112 0 0 0 0 45467 8942 0 34 66 0 0 0 0 353164 20776 73112 0 0 0 0 45464 8933 0 32 68 0 0 0 0 353196 20776 73112 0 0 0 0 45535 8937 0 33 67 0 0 0 0 353196 20776 73112 0 0 0 0 45329 8916 0 33 67 0 0 0 0 353228 20776 73112 0 0 0 0 45610 8953 0 33 67 0 So in summary: * there is no regression in performance with the patch * The improved locking does not show any real difference between installations, but then we would nto be expecting an interrupt while running spi_async (at least not with this driver). * there is a slight indication that System CPU values are slightly lower as we have 31% and 32% System CPU that we have not seen in the tests prior to applying the patch. This is thus related to the optimization of avoiding the repeated message checks in case of an optimized driver. So this has a slightly positive impact on overall performance. we could add a means to optimize all those spi_write/read/write_then_read calls by compiling them. But that depends still on what we want to implement. Assumptions by default should be: * is_dma_mapped is set * no change to structure, length, speed, bits, delays are allowed. We may think of adding either flags for "global" changes like: * allow length of transfer to change * allow the dma_address of the transfer to change (so still with is_dma_mapped) * allow the address of the transfer to change (would mean overhead for mapping dma on a per call basis) >From an optimization perspective it might be better to give "hints" to what may change per spi_transfer, so adding a "u32 compile_hint" to spi_transfer might be preferable to make it work best. Please review and comment. Thanks, Martin P.s: The patch itself - it is against 3.10.16, so it might not apply cleanly... diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 32b7bb1..8f1aede 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1346,7 +1346,18 @@ int spi_setup(struct spi_device *spi) } EXPORT_SYMBOL_GPL(spi_setup); -static int __spi_async(struct spi_device *spi, struct spi_message *message) +/** + * spi_message_verify - verifies if the spi message is supported + * by the bus-master + * @spi: device with which data will be exchanged + * @message: describes the data transfers, including completion callback + * Context: any (irqs may be blocked, etc) + * + * This call may be used in_irq and other contexts which can't sleep, + * as well as from task contexts which can sleep. + */ +extern int spi_message_verify(struct spi_device *spi, + struct spi_message *message) { struct spi_master *master = spi->master; struct spi_transfer *xfer; @@ -1388,6 +1399,20 @@ static int __spi_async(struct spi_device *spi, struct spi_message *message) return -EINVAL; } } + return 0; +} +EXPORT_SYMBOL_GPL(spi_message_verify); + +static int __spi_async(struct spi_device *spi, struct spi_message *message) +{ + struct spi_master *master = spi->master; + int ret; + + if (!message->is_compiled) { + ret=spi_message_verify(spi,message); + if (ret) + return ret; + } message->spi = spi; message->status = -EINPROGRESS; @@ -1430,15 +1455,12 @@ int spi_async(struct spi_device *spi, struct spi_message *message) unsigned long flags; spin_lock_irqsave(&master->bus_lock_spinlock, flags); - - if (master->bus_lock_flag) - ret = -EBUSY; - else - ret = __spi_async(spi, message); - + ret=master->bus_lock_flag; spin_unlock_irqrestore(&master->bus_lock_spinlock, flags); + if (ret) + return -EBUSY; - return ret; + return __spi_async(spi, message); } EXPORT_SYMBOL_GPL(spi_async); diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 6ff26c8..cea622f 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -378,6 +378,14 @@ struct spi_master { int (*unprepare_transfer_hardware)(struct spi_master *master); /* gpio chip select */ int *cs_gpios; + /* compile (static) message handlers */ + int (*message_compile)(struct spi_master *master, + struct spi_message *mesg, + u32 flags + ); + void (*message_uncompile)(struct spi_master *master, + struct spi_message *mesg + ); }; static inline void *spi_master_get_devdata(struct spi_master *master) @@ -532,6 +540,12 @@ struct spi_transfer { * @spi: SPI device to which the transaction is queued * @is_dma_mapped: if true, the caller provided both dma and cpu virtual * addresses for each transfer buffer + * @is_compiled: if true, then the message has been initialized + * via spi_compile_message + * @compiled_flags: flags given when the message got compiled + * via spi_message_compile + * @compiled_data: data owned by bus-master driver between + * spi_message_compile and spi_message_uncompile * @complete: called to report transaction completions * @context: the argument to complete() when it's called * @actual_length: the total number of bytes that were transferred in all @@ -561,6 +575,13 @@ struct spi_message { unsigned is_dma_mapped:1; + /* bus-driver owned data for initialized messages + * between spi_message_compile and spi_message_uncompile + */ + unsigned is_compiled:1; + u32 compiled_flags; + void *compiled_data; + /* REVISIT: we might want a flag affecting the behavior of the * last transfer ... allowing things like "read 16 bit length L" * immediately followed by "read L bytes". Basically imposing @@ -604,6 +625,41 @@ spi_transfer_del(struct spi_transfer *t) list_del(&t->transfer_list); } +extern int spi_message_verify(struct spi_device *spi, + struct spi_message *m); + +static inline int spi_message_compile(struct spi_device *spi, + struct spi_message *m, + u32 flags) +{ + int verify; + if (m->is_compiled) + return -EOPNOTSUPP; + + if ((verify=spi_message_verify(spi,m))) + return verify; + + m->is_compiled=1; + m->compiled_flags=flags; + m->compiled_data=NULL; + + if (spi->master->message_compile) + return spi->master->message_compile(spi->master,m,flags); + else + return 0; +} + +static inline void spi_message_uncompile(struct spi_device *spi, + struct spi_message *m) +{ + if (spi->master->message_uncompile) + spi->master->message_uncompile(spi->master,m); + + m->is_compiled=0; + m->compiled_flags=0; + m->compiled_data=NULL; +} + /** * spi_message_init_with_transfers - Initialize spi_message and append transfers * @m: spi_message to be initialized -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>]
* Re: [PATCH] added API: spi_message_compile/optimize/hint/... [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> @ 2013-11-19 19:25 ` Mark Brown 2013-11-20 9:11 ` martin sperl 2013-11-20 11:29 ` Mark Brown 2 siblings, 0 replies; 9+ messages in thread From: Mark Brown @ 2013-11-19 19:25 UTC (permalink / raw) To: Martin Sperl; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 401 bytes --] On Tue, Nov 19, 2013 at 07:33:14PM +0100, Martin Sperl wrote: > Here the patch for a spi_message_compile interface > (as discussed in a separate thread). Not looked at the patch yet (and I'm about to finish for today so it'll be tomorrow) but it seems you forgot the Signed-off-by - I need that to be able to apply if everything is OK. See SubmittingPatches for details of what it means. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] added API: spi_message_compile/optimize/hint/... [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> 2013-11-19 19:25 ` Mark Brown @ 2013-11-20 9:11 ` martin sperl 2013-11-20 11:29 ` Mark Brown 2 siblings, 0 replies; 9+ messages in thread From: martin sperl @ 2013-11-20 9:11 UTC (permalink / raw) To: linux-spi-u79uwXL29TY76Z2rM5mHXA; +Cc: Mark Brown Here the missing sign-off: Signed-Off-By: Martin Sperl <kernel-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> Is this sufficient or do you need to send the patch again? By the way: I am thinking of extending the spi_transfer with another set of compiled_flags, so that individual spi_transfers may get marked as "modifiable". Adding also something like the following defines: #define SPI_MESSAGE_COMPILE_MODIFIES_RXBUF (1<<0) #define SPI_MESSAGE_COMPILE_MODIFIES_TXBUF (1<<1) #define SPI_MESSAGE_COMPILE_MODIFIES_RXDMA ((1<<2)|(1<<0)) #define SPI_MESSAGE_COMPILE_MODIFIES_TXDMA ((1<<3)|(1<<1)) #define SPI_MESSAGE_COMPILE_MODIFIES_LEN (1<<4) #define SPI_MESSAGE_COMPILE_MODIFIES_DELAY (1<<5) #define SPI_MESSAGE_COMPILE_IN_ATOMIC_CONTEXT (1<<30) #define SPI_MESSAGE_COMPILE_FLAGS_SET_IN_XFER (1<<31) Note that SPI_MESSAGE_COMPILE_FLAGS_SET_IN XFER is there mainly to allow copying the default flags from spi_message_compile to the individual transfers if xfer.compile_flags=0. This copy would happen in spi_message_verify (formerly in __spi_async). adding the following in the list_for_each_entry: if (!xfer->compile_flags) xfer->compile_flags = message->compiled_flags; (but it would also require a slight change in spi_message_compile to set message->compiled_flags before calling spi_message_verify. That would allow spi_read, spi_write, spi_write_then_read to define its "variation" in the structure and then compile the message. Which would give a slight boost - and if I think of it it may make things faster for a dma driver, as there could be some "shortcuts"... for the different cases that could be made quite efficient if it can rely on the above hints... (essentially a list with copy from X to Y and maybe the required DMA mapping computation - if necessary) That way we at least avoid the memory allocation and the computation of structures. Please tell me your preferences. Martin On 19.11.2013 19:33, Martin Sperl wrote: > Hi! > > Here the patch for a spi_message_compile interface > (as discussed in a separate thread). > > Naming of the API is subject to discussion - instead of "compile" it could > also be "optimize" or "hint" or ... > > The patch creates: > * 3 additional variables in spi_message > * 2 additional function pointers in spi_master > * spi_message_compile/uncompile using the above > * __spi_async modified, so that the "validation" step for the message > does not run every time. > > Here some measurements using my "standard" CAN test: > * runs on Raspberry Pi without over-clocking > * optimized mcp2515 driver with 3 Message chunks and callbacks running > in listen only mode > * CAN bus is running at 125kHz (not 500 as before to simulate "less" load > - 25% to be exact) > * single can device retransmitting single frame repeatedly. (every 2.1s there > is a gap of 15ms when no traffic happens - device reboots after watchdog > timeout) > * spi-bcm2835 stock kernel driver is used with patch to run message pump with > realtime priority. > * kernel 3.10.16 (foundation based, as the vanilla RPI kernel still misses > dmaengine, so it is easier to develop against this kernel) > > Main metrics reported are taken with "vmstat 10 12" > > spi-bcm2835 realtime=0, unpatched spi.h, mcp2515 without any "compiles": > Messages received: 178523 > Messages in second HW buffer: 20 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 411516 14380 26436 0 0 588 19 1340 653 12 31 37 20 > 0 0 0 411232 14388 26436 0 0 0 4 45880 9412 0 35 61 4 > 0 0 0 411104 14388 26436 0 0 0 0 45843 9378 0 34 66 0 > 0 0 0 411040 14388 26436 0 0 0 8 45803 9528 0 41 59 0 > 0 0 0 410816 14396 26436 0 0 0 6 45703 9394 0 38 53 9 > 0 0 0 410624 14396 26436 0 0 0 0 45623 9270 0 33 67 0 > 0 0 0 410560 14396 26436 0 0 0 0 45793 9335 0 34 66 0 > 0 0 0 410464 14396 26436 0 0 0 3 45786 9367 0 35 65 0 > 0 0 0 410464 14396 26436 0 0 0 0 45741 9364 0 35 65 0 > 0 0 0 410496 14396 26436 0 0 0 0 45772 9360 0 35 65 0 > 0 0 0 410464 14396 26436 0 0 0 0 45731 9309 0 34 66 0 > 0 0 0 410528 14396 26436 0 0 0 0 45726 9287 0 34 66 0 > > spi-bcm2835 realtime=1 unpatched spi.h, mcp2515 without any "compiles": > Messages received: 178031 > Messages in second HW buffer: 4 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 411512 14408 26440 0 0 57 2 7223 1497 1 8 89 2 > 0 0 0 411384 14416 26440 0 0 0 2 45815 9827 0 34 61 5 > 0 0 0 411288 14416 26440 0 0 0 0 45778 9822 0 33 67 0 > 0 0 0 411128 14416 26440 0 0 0 2 45630 9807 0 35 65 0 > 0 0 0 411064 14424 26440 0 0 0 6 45705 10029 0 40 54 6 > 0 0 0 411032 14424 26440 0 0 0 0 45628 9807 0 33 67 0 > 0 0 0 410840 14424 26440 0 0 0 0 45611 9763 0 33 67 0 > 0 0 0 410744 14424 26440 0 0 0 0 45544 9716 0 33 67 0 > 0 0 0 410712 14424 26440 0 0 0 0 45574 9756 0 33 67 0 > 0 0 0 410712 14424 26440 0 0 0 0 45660 9807 0 33 67 0 > 0 0 0 410712 14424 26440 0 0 0 0 45494 9768 0 33 67 0 > 0 0 0 410712 14424 26440 0 0 0 0 45549 9778 0 34 66 0 > > spi-bcm2835 realtime=0 patched spi.h, mcp2515 without any "compiles": > Messages received: 178378 > Messages in second HW buffer: 6 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 355064 20460 71900 0 0 182 20 8400 1786 15 17 57 11 > 0 0 0 355180 20468 71900 0 0 0 2 45380 8788 0 33 59 8 > 0 0 0 355116 20468 71900 0 0 0 0 46032 8863 0 33 67 0 > 1 0 0 355020 20468 71900 0 0 0 2 45768 8865 0 36 64 0 > 0 0 0 354924 20476 71900 0 0 0 6 45749 8869 0 39 51 10 > 0 0 0 354828 20476 71900 0 0 0 0 45452 8747 0 33 67 0 > 0 0 0 354732 20476 71900 0 0 0 0 45644 8757 0 33 67 0 > 0 0 0 354540 20476 71900 0 0 0 0 45404 8746 0 33 67 0 > 0 0 0 354508 20476 71900 0 0 0 0 45678 8739 0 32 68 0 > 0 0 0 354548 20476 71900 0 0 0 0 45959 8804 0 33 67 0 > 0 0 0 354516 20476 71900 0 0 0 0 45984 8831 0 33 67 0 > 0 0 0 354548 20476 71900 0 0 0 0 45925 8835 0 34 66 0 > > spi-bcm2835 realtime=1 patched spi.h, mcp2515 without any "compiles": > Messages received: 177648 > Messages in second HW buffer: 6 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 355456 20488 71908 0 0 116 13 11779 2372 10 15 68 7 > 0 0 0 355296 20496 71908 0 0 0 2 45963 8916 0 35 61 3 > 0 0 0 355168 20496 71908 0 0 0 0 45879 8913 0 35 65 0 > 0 0 0 355104 20496 71908 0 0 0 2 45803 8880 0 36 64 0 > 0 0 0 355008 20504 71908 0 0 0 8 45352 8888 0 41 48 11 > 0 0 0 354880 20504 71908 0 0 0 0 45424 8846 0 36 64 0 > 0 0 0 354816 20504 71908 0 0 0 0 45247 8804 0 34 66 0 > 0 0 0 354720 20504 71908 0 0 0 0 45532 8824 0 35 65 0 > 0 0 0 354688 20504 71908 0 0 0 0 45553 8826 0 36 64 0 > 0 0 0 354720 20504 71908 0 0 0 0 45490 8834 0 34 66 0 > 0 0 0 354688 20504 71908 0 0 0 0 45673 8880 0 35 65 0 > 0 0 0 354688 20504 71908 0 0 0 0 45041 8674 0 34 66 0 > > spi-bcm2835 realtime=0 patched spi.h, mcp2515 with "compile": > Messages received: 178493 > Messages in second HW buffer: 38 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 353868 20792 73116 0 0 74 9 15549 3084 8 16 71 5 > 0 0 0 353804 20800 73116 0 0 0 2 45717 9051 0 32 63 5 > 0 0 0 353676 20800 73116 0 0 0 0 45844 9044 0 33 67 0 > 0 0 0 353620 20800 73116 0 0 0 1 45829 9120 0 36 64 0 > 0 0 0 353588 20804 73116 0 0 0 2 45885 9164 0 37 58 5 > 0 0 0 353460 20804 73116 0 0 0 0 45737 9070 0 32 68 0 > 0 0 0 353364 20804 73116 0 0 0 0 45752 9056 0 31 69 0 > 0 0 0 353204 20804 73116 0 0 0 0 45847 9078 0 33 67 0 > 0 0 0 353140 20804 73116 0 0 0 0 45629 9035 0 33 67 0 > 0 0 0 353108 20804 73116 0 0 0 0 45677 9052 0 32 68 0 > 0 0 0 353140 20804 73116 0 0 0 0 45692 9025 0 33 67 0 > 0 0 0 353172 20804 73116 0 0 0 0 45687 9023 0 33 67 0 > > spi-bcm2835 realtime=1 patched spi.h, mcp2515 with "compile": > Messages received: 177623 > Messages in second HW buffer: 3 > procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 353964 20760 73112 0 0 87 10 13506 2687 9 15 70 6 > 0 0 0 353836 20768 73112 0 0 0 2 45629 8960 0 34 63 4 > 0 0 0 353740 20768 73112 0 0 0 0 45360 8920 0 33 67 0 > 0 0 0 353676 20768 73112 0 0 0 2 45660 9033 0 35 65 0 > 0 0 0 353580 20776 73112 0 0 0 6 45714 9007 0 36 57 6 > 0 0 0 353452 20776 73112 0 0 0 0 45489 8897 0 32 68 0 > 0 0 0 353292 20776 73112 0 0 0 0 45615 8976 0 33 67 0 > 0 0 0 353164 20776 73112 0 0 0 0 45467 8942 0 34 66 0 > 0 0 0 353164 20776 73112 0 0 0 0 45464 8933 0 32 68 0 > 0 0 0 353196 20776 73112 0 0 0 0 45535 8937 0 33 67 0 > 0 0 0 353196 20776 73112 0 0 0 0 45329 8916 0 33 67 0 > 0 0 0 353228 20776 73112 0 0 0 0 45610 8953 0 33 67 0 > > So in summary: > * there is no regression in performance with the patch > * The improved locking does not show any real difference between > installations, but then we would nto be expecting an interrupt while running > spi_async (at least not with this driver). > * there is a slight indication that System CPU values are slightly lower as we > have 31% and 32% System CPU that we have not seen in the tests prior to > applying the patch. This is thus related to the optimization of avoiding the > repeated message checks in case of an optimized driver. > > So this has a slightly positive impact on overall performance. > we could add a means to optimize all those spi_write/read/write_then_read > calls by compiling them. But that depends still on what we want to implement. > > Assumptions by default should be: > * is_dma_mapped is set > * no change to structure, length, speed, bits, delays are allowed. > > We may think of adding either flags for "global" changes like: > * allow length of transfer to change > * allow the dma_address of the transfer to change (so still with > is_dma_mapped) > * allow the address of the transfer to change (would mean overhead for > mapping dma on a per call basis) > > From an optimization perspective it might be better to give "hints" to what > may change per spi_transfer, so adding a "u32 compile_hint" to spi_transfer > might be preferable to make it work best. > > Please review and comment. > > Thanks, > Martin > > P.s: The patch itself - it is against 3.10.16, so it might not apply cleanly... > > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c > index 32b7bb1..8f1aede 100644 > --- a/drivers/spi/spi.c > +++ b/drivers/spi/spi.c > @@ -1346,7 +1346,18 @@ int spi_setup(struct spi_device *spi) > } > EXPORT_SYMBOL_GPL(spi_setup); > > -static int __spi_async(struct spi_device *spi, struct spi_message *message) > +/** > + * spi_message_verify - verifies if the spi message is supported > + * by the bus-master > + * @spi: device with which data will be exchanged > + * @message: describes the data transfers, including completion callback > + * Context: any (irqs may be blocked, etc) > + * > + * This call may be used in_irq and other contexts which can't sleep, > + * as well as from task contexts which can sleep. > + */ > +extern int spi_message_verify(struct spi_device *spi, > + struct spi_message *message) > { > struct spi_master *master = spi->master; > struct spi_transfer *xfer; > @@ -1388,6 +1399,20 @@ static int __spi_async(struct spi_device *spi, struct spi_message *message) > return -EINVAL; > } > } > + return 0; > +} > +EXPORT_SYMBOL_GPL(spi_message_verify); > + > +static int __spi_async(struct spi_device *spi, struct spi_message *message) > +{ > + struct spi_master *master = spi->master; > + int ret; > + > + if (!message->is_compiled) { > + ret=spi_message_verify(spi,message); > + if (ret) > + return ret; > + } > > message->spi = spi; > message->status = -EINPROGRESS; > @@ -1430,15 +1455,12 @@ int spi_async(struct spi_device *spi, struct spi_message *message) > unsigned long flags; > > spin_lock_irqsave(&master->bus_lock_spinlock, flags); > - > - if (master->bus_lock_flag) > - ret = -EBUSY; > - else > - ret = __spi_async(spi, message); > - > + ret=master->bus_lock_flag; > spin_unlock_irqrestore(&master->bus_lock_spinlock, flags); > + if (ret) > + return -EBUSY; > > - return ret; > + return __spi_async(spi, message); > } > EXPORT_SYMBOL_GPL(spi_async); > > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h > index 6ff26c8..cea622f 100644 > --- a/include/linux/spi/spi.h > +++ b/include/linux/spi/spi.h > @@ -378,6 +378,14 @@ struct spi_master { > int (*unprepare_transfer_hardware)(struct spi_master *master); > /* gpio chip select */ > int *cs_gpios; > + /* compile (static) message handlers */ > + int (*message_compile)(struct spi_master *master, > + struct spi_message *mesg, > + u32 flags > + ); > + void (*message_uncompile)(struct spi_master *master, > + struct spi_message *mesg > + ); > }; > > static inline void *spi_master_get_devdata(struct spi_master *master) > @@ -532,6 +540,12 @@ struct spi_transfer { > * @spi: SPI device to which the transaction is queued > * @is_dma_mapped: if true, the caller provided both dma and cpu virtual > * addresses for each transfer buffer > + * @is_compiled: if true, then the message has been initialized > + * via spi_compile_message > + * @compiled_flags: flags given when the message got compiled > + * via spi_message_compile > + * @compiled_data: data owned by bus-master driver between > + * spi_message_compile and spi_message_uncompile > * @complete: called to report transaction completions > * @context: the argument to complete() when it's called > * @actual_length: the total number of bytes that were transferred in all > @@ -561,6 +575,13 @@ struct spi_message { > > unsigned is_dma_mapped:1; > > + /* bus-driver owned data for initialized messages > + * between spi_message_compile and spi_message_uncompile > + */ > + unsigned is_compiled:1; > + u32 compiled_flags; > + void *compiled_data; > + > /* REVISIT: we might want a flag affecting the behavior of the > * last transfer ... allowing things like "read 16 bit length L" > * immediately followed by "read L bytes". Basically imposing > @@ -604,6 +625,41 @@ spi_transfer_del(struct spi_transfer *t) > list_del(&t->transfer_list); > } > > +extern int spi_message_verify(struct spi_device *spi, > + struct spi_message *m); > + > +static inline int spi_message_compile(struct spi_device *spi, > + struct spi_message *m, > + u32 flags) > +{ > + int verify; > + if (m->is_compiled) > + return -EOPNOTSUPP; > + > + if ((verify=spi_message_verify(spi,m))) > + return verify; > + > + m->is_compiled=1; > + m->compiled_flags=flags; > + m->compiled_data=NULL; > + > + if (spi->master->message_compile) > + return spi->master->message_compile(spi->master,m,flags); > + else > + return 0; > +} > + > +static inline void spi_message_uncompile(struct spi_device *spi, > + struct spi_message *m) > +{ > + if (spi->master->message_uncompile) > + spi->master->message_uncompile(spi->master,m); > + > + m->is_compiled=0; > + m->compiled_flags=0; > + m->compiled_data=NULL; > +} > + > /** > * spi_message_init_with_transfers - Initialize spi_message and append transfers > * @m: spi_message to be initialized > > -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] added API: spi_message_compile/optimize/hint/... [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> 2013-11-19 19:25 ` Mark Brown 2013-11-20 9:11 ` martin sperl @ 2013-11-20 11:29 ` Mark Brown [not found] ` <20131120112920.GX2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> 2 siblings, 1 reply; 9+ messages in thread From: Mark Brown @ 2013-11-20 11:29 UTC (permalink / raw) To: Martin Sperl; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 3062 bytes --] On Tue, Nov 19, 2013 at 07:33:14PM +0100, Martin Sperl wrote: > Here the patch for a spi_message_compile interface > (as discussed in a separate thread). As well as the signoff please try to follow the other stuff in SubmittingPatches, both in terms of the formatting of the patch and in terms of things like coding style - checkpatch is a useful tool for checking this stuff. The shape of the patch looks good, but there's a few issues below. > Naming of the API is subject to discussion - instead of "compile" it could > also be "optimize" or "hint" or ... Hrm. I don't really like any of these options but can't think of anything else right now. Perhaps optimize is the best of the options here. > The patch creates: > * 3 additional variables in spi_message > * 2 additional function pointers in spi_master Can you keep the master interface in a separate patch please? I'm not sure if we want to do this or if we want to push more of the work to the core, especially given the desire to factor the generic bits of DMA work out of drivers, but the external interface seems very clear now so we should be able to get that merged more easily (and then things like the mcp2515 can start building on that). For the master interface I think I'd like to at least see a patch adding a driver using it go in along with the core changes. This is probably going to be a bit difficult right now given the situation with the Raspberry Pi kernels unfortunately. > Here some measurements using my "standard" CAN test: > * runs on Raspberry Pi without over-clocking > * optimized mcp2515 driver with 3 Message chunks and callbacks running > in listen only mode Is this device part of the vanilla board out of interest? > * kernel 3.10.16 (foundation based, as the vanilla RPI kernel still misses > dmaengine, so it is easier to develop against this kernel) Patches really need to be sent against the latest code for application - this is going to collide with the factoring of the validation out of the lock that I did too. Normally that should be git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next though until the merge window is over that'll need to be topic/core. > +/** > + * spi_message_verify - verifies if the spi message is supported > + * by the bus-master > + * @spi: device with which data will be exchanged > + * @message: describes the data transfers, including completion callback > + * Context: any (irqs may be blocked, etc) > + * > + * This call may be used in_irq and other contexts which can't sleep, > + * as well as from task contexts which can sleep. > + */ > +extern int spi_message_verify(struct spi_device *spi, > + struct spi_message *message) This stuff does more than verify, it also initialises the message with defaults - if it were really a verify() then it'd be able to have the message passed as a const. > + if ((verify=spi_message_verify(spi,m))) > + return verify; Assign and check the result separately. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20131120112920.GX2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>]
* Re: [PATCH] added API: spi_message_compile/optimize/hint/... [not found] ` <20131120112920.GX2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> @ 2013-11-20 15:27 ` Martin Sperl [not found] ` <05C64EF6-0F3B-4CBC-A462-45ED8F1564A7-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Martin Sperl @ 2013-11-20 15:27 UTC (permalink / raw) To: Mark Brown; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA OK - I will supply it as 2 patches. So it is spi_message_optimize - unless you can think of something different. As mentioned before I will take the opportunity to move the flags into spi_transfer for optimization hints, because there is where the change really happens... The current bits come to my mind: #define SPI_OPTIMIZE_FIXED_TX_BUF (1<<0) #define SPI_OPTIMIZE_FIXED_TX_DMA ((1<<1)|(1<<0)) #define SPI_OPTIMIZE_FIXED_RX_BUF (1<<2) #define SPI_OPTIMIZE_FIXED_RX_DMA ((1<<3)|(1<<2)) #define SPI_OPTIMIZE_FIXED_LEN (1<<4) #define SPI_OPTIMIZE_FIXED_DELAY (1<<5) #define SPI_OPTIMIZE_FIXED_SPEED (1<<6) #define SPI_OPTIMIZE_FIXED_BITS (1<<7) #define SPI_OPTIMIZE_FIXED_ALL (-1) The flags argument now is of type gfp_t to allow for memory allocation policies. > > Is this device part of the vanilla board out of interest? > no, unfortunately not, but you can get an extension board for 20$. >> * kernel 3.10.16 (foundation based, as the vanilla RPI kernel still misses >> dmaengine, so it is easier to develop against this kernel) > > Patches really need to be sent against the latest code for application - > this is going to collide with the factoring of the validation out of the > lock that I did too. Normally that should be > > git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next > > though until the merge window is over that'll need to be topic/core. > I will provide the patches as soon as I have pulled your tree - only problem is that I can not really compile it on this tree on my RPI... As you have said: it is a bit complicated with the RPI board+kernel right now... I got a separate board for these exercises in the future, but without dma-engine or USB/networking it is a bit tough to develop and debug a system over the serial line... > > This stuff does more than verify, it also initialises the message with > defaults - if it were really a verify() then it'd be able to have the > message passed as a const. For now I will call it: spi_message_verify_and_prepare. The issue is that we also need this when optimizing the message - otherwise I would have left it where it is... Also I started to think of how I can optimize the DMA-chain generation by creating "dma_fragments" that I just need to join together and which are cached by the bus driver (assuming that we have similar usage all the way, so that the cached objects are repeatedly used. This is still work in progress, but it should bring down CPU utilization from 45% hopefully to the same range as the current interrupt driven driver. This might then get pushed into the direction of DMA-eninge to become more generic. The caches that I can think of right now are: dma_fragment_cache_set_cdiv dma_fragment_cache_init_transfer dma_fragment_cache_transfer_txrx dma_fragment_cache_cs_change dma_fragment_cache_delay dma_fragment_cache_end_of_message These would match the "fragments" that I can think of. OK, we could add cdiv into init_transfer, but changing bus-speeds is not very common - I would assume, so it would just add a delay... Each of those unfortunately would require a separate create/free function to populate the caches. So generalizing this into spi_master would blow up this structure... But then these cache structures would need to sit somewhere anyway, so adding them to the master would need to happen one way or another... But then maybe there are different ideas for something like this (but I assume a driver making use of this would be required first) Ciao, Martin-- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <05C64EF6-0F3B-4CBC-A462-45ED8F1564A7-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>]
* Re: [PATCH] added API: spi_message_compile/optimize/hint/... [not found] ` <05C64EF6-0F3B-4CBC-A462-45ED8F1564A7-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> @ 2013-11-20 16:24 ` Mark Brown [not found] ` <20131120162449.GJ2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Mark Brown @ 2013-11-20 16:24 UTC (permalink / raw) To: Martin Sperl; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 1913 bytes --] On Wed, Nov 20, 2013 at 04:27:06PM +0100, Martin Sperl wrote: > As mentioned before I will take the opportunity to move the flags into > spi_transfer for optimization hints, because there is where the change > really happens... These are set by the caller? > The current bits come to my mind: > #define SPI_OPTIMIZE_FIXED_TX_BUF (1<<0) > #define SPI_OPTIMIZE_FIXED_TX_DMA ((1<<1)|(1<<0)) > #define SPI_OPTIMIZE_FIXED_RX_BUF (1<<2) > #define SPI_OPTIMIZE_FIXED_RX_DMA ((1<<3)|(1<<2)) > #define SPI_OPTIMIZE_FIXED_LEN (1<<4) > #define SPI_OPTIMIZE_FIXED_DELAY (1<<5) > #define SPI_OPTIMIZE_FIXED_SPEED (1<<6) > #define SPI_OPTIMIZE_FIXED_BITS (1<<7) > #define SPI_OPTIMIZE_FIXED_ALL (-1) > The flags argument now is of type gfp_t to allow for memory allocation > policies. I'd be tempted to start with assuming everything is fixed and then add flags to mark things as variable rather than the other way around since that's the most conservative option and should permit the greatest level of optimisation. > I will provide the patches as soon as I have pulled your tree - only > problem is that I can not really compile it on this tree on my RPI... You should at least be able to do build tests, it's OK if you can't do runtime tests - I can do validation myself, as can other people working with -next. > Each of those unfortunately would require a separate create/free function > to populate the caches. So generalizing this into spi_master would blow up > this structure... But then these cache structures would need to sit > somewhere anyway, so adding them to the master would need to happen > one way or another... But then maybe there are different ideas for something > like this (but I assume a driver making use of this would be required first) Yes, there's some tricky issues here... let's have a think about it and try to do this incrementally. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20131120162449.GJ2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>]
* Re: [PATCH] added API: spi_message_compile/optimize/hint/... [not found] ` <20131120162449.GJ2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> @ 2013-11-20 18:42 ` Martin Sperl [not found] ` <9DE09AF6-AD13-45DE-A36F-BD65E619C342-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Martin Sperl @ 2013-11-20 18:42 UTC (permalink / raw) To: Mark Brown; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA >> As mentioned before I will take the opportunity to move the flags into >> spi_transfer for optimization hints, because there is where the change >> really happens... > > These are set by the caller? yes - well similar to change_cs, delay_us,... Patch looks like this: @@ -585,6 +587,17 @@ struct spi_transfer { u16 delay_usecs; u32 speed_hz; +#define SPI_OPTIMIZE_FIXED_TX_BUF (1<<0) +#define SPI_OPTIMIZE_FIXED_TX_DMA ((1<<1)|(1<<0)) +#define SPI_OPTIMIZE_FIXED_RX_BUF (1<<2) +#define SPI_OPTIMIZE_FIXED_RX_DMA ((1<<3)|(1<<2)) +#define SPI_OPTIMIZE_FIXED_LEN (1<<4) +#define SPI_OPTIMIZE_FIXED_DELAY (1<<5) +#define SPI_OPTIMIZE_FIXED_SPEED (1<<6) +#define SPI_OPTIMIZE_FIXED_BITS (1<<7) +#define SPI_OPTIMIZE_FIXED_ALL (-1) + u32 optimize_hints; + struct list_head transfer_list; }; > > I'd be tempted to start with assuming everything is fixed and then add > flags to mark things as variable rather than the other way around since > that's the most conservative option and should permit the greatest level > of optimisation. > I thought the same at some point - it is just a matter of naming them. One argument for explicitly naming what IS fixed, is the fact that this way the default 0 maps to the current behavior, where you may not make any "assumptions". This way you could also optimize transparently (without an explicit call), but doing the optimization transparently means opening up a can of worms: * when to release the memory that was "optimized"? Do we really want to add garbage collection to the kernel? * such transparent "optimization" would also occur during an interrupt and then it would need to allocate with GFP_ATOMIC * also time spent in interrupt would be minimal - adding a thread for that ads complexity that is why the spi_message_optimize contains also a gfp_t flag to allocate from the correct pool... It is in the end a matter of taste, so tell me which way you prefer the solution: * 0 = everything variable (so the zkalloced memory is set like this) * -1 = everything variable >> I will provide the patches as soon as I have pulled your tree - only >> problem is that I can not really compile it on this tree on my RPI... > > You should at least be able to do build tests, it's OK if you can't do > runtime tests - I can do validation myself, as can other people working > with -next. I right now started to build a kernel from your sources, but it compiles slowly on a raspberry-pi... (also the default .config is not complete and a lot of things are left outside, like can support as module,... - I wish someone would maintain a better .config I could use instead of starting with make bcm2835_defconfig) > Yes, there's some tricky issues here... let's have a think about it and > try to do this incrementally. we are on the same page on this! First get the driver really working with the Foundation kernel and maybe by that time we have the RPI-dmaengine also in the kernel... Martin -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <9DE09AF6-AD13-45DE-A36F-BD65E619C342-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>]
* Re: [PATCH] added API: spi_message_compile/optimize/hint/... [not found] ` <9DE09AF6-AD13-45DE-A36F-BD65E619C342-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> @ 2013-11-21 19:10 ` Mark Brown [not found] ` <20131121191004.GH8120-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Mark Brown @ 2013-11-21 19:10 UTC (permalink / raw) To: Martin Sperl; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 1250 bytes --] On Wed, Nov 20, 2013 at 07:42:44PM +0100, Martin Sperl wrote: > One argument for explicitly naming what IS fixed, is the fact that > this way the default 0 maps to the current behavior, where you may not > make any "assumptions". Sure, but then this only applies if the client explicitly calls a function to optimise things. Anything done transparently behind clients should definitely set all the flags but things that are actively engaging with optimisation are different and it's that I'm thinking about here. Otherwise what happens is that if we add a new variable (like we did when we added support for dual and quad mode recently) we end up having to go round every client driver updating them to say that this new thing they never heard of isn't going to change which doesn't seem like winning. > I right now started to build a kernel from your sources, but it compiles > slowly on a raspberry-pi... (also the default .config is not complete and > a lot of things are left outside, like can support as module,... - I wish > someone would maintain a better .config I could use instead of starting > with make bcm2835_defconfig) Cross building FTW :) If you can't test anyway then build testing on x86 is fine. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20131121191004.GH8120-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>]
* Re: [PATCH] added API: spi_message_compile/optimize/hint/... [not found] ` <20131121191004.GH8120-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> @ 2013-11-22 6:53 ` Martin Sperl 0 siblings, 0 replies; 9+ messages in thread From: Martin Sperl @ 2013-11-22 6:53 UTC (permalink / raw) To: Mark Brown; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA Hi Mark! Here the new version of the patch that now: * applies against topic/core (and also got compiled) * has the bus-driver-specific stuff left out (but comments where this would go) * Driver hints are now only in spi_transfer * the optimize_hint bitfield is now a field of things that may change (as discussed) See also a comment in code about the possible "generic" use of optimize: In principle this code could also generically set up DMA mapping - especially for platforms that do not have to do a bounce-buffer that they need to set up and copy to/from before/after DMA. Also just allocating a bounce buffer just once may also be possible optimization to reduce overhead. This would make it easier to avoid the "unloved" dma interface (rx_dma/tx_dma) for driver developers - they would just have to call spi_message_optimize and stop caring about DMA-able addresses - it would be taken care of... We could also make flags: #define SPI_OPTIMIZE_ALLOCATE_RXBUFFER (1<<6) #define SPI_OPTIMIZE_ALLOCATE_TXBUFFER (1<<7) which would allocate the rx/tx buffers for the driver with the positive side-effect that there is no more messing with dma_masks and such and would avoid the requirement of mapping/unmapping before/after each DMA transaction. (would need to happen in the spi_validate function to be generic enough and may require some capability hints from the dma framework of what it can support). Martin Signed-Off-By: Martin Sperl <kernel-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index e40b236..6d1d773 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1680,6 +1680,53 @@ static int __spi_validate(struct spi_device *spi, struct spi_message *message) return 0; } +/** + * spi_message_optimize - analysis of the spi message + * optimizing it for later use + * @spi: device with which data will be exchanged + * @message: describes the data transfers, including completion + * callback and optimization hints + * Context: can sleep + * + * This call optimizes the message for later efficient use. + * this may mean preparing DMA pattern for the bus in question. + * + * Between the optimize and unoptimize call the structure + * of the message may not get changed in any way + * unless explicitly stated in optimize_hints in the + * spi_transfer structure, which states explicitly which parts + * may change, so that the driver may make optimizations + * in handling these messages. + * + * For example: this may set up DMA mapping early without + * the driver having to provide rx_dma/tx_dma itself. + */ + +int spi_message_optimize(struct spi_device *spi, + struct spi_message *m, + gfp_t flags) +{ + int ret; + + if (m->is_optimized) + return -EOPNOTSUPP; + + ret=__spi_validate(spi,m); + if (!ret) { + /* bus-driver specific call goes here */ + m->is_optimized=1; + } + + return ret; +} + +void spi_message_unoptimize(struct spi_device *spi, + struct spi_message *m) +{ + /* bus-driver specific call goes here */ + m->is_optimized=0; +} + static int __spi_async(struct spi_device *spi, struct spi_message *message) { struct spi_master *master = spi->master; @@ -1726,9 +1773,11 @@ int spi_async(struct spi_device *spi, struct spi_message *message) int ret; unsigned long flags; - ret = __spi_validate(spi, message); - if (ret != 0) - return ret; + if (!message->is_optimized) { + ret = __spi_validate(spi, message); + if (ret != 0) + return ret; + } spin_lock_irqsave(&master->bus_lock_spinlock, flags); @@ -1778,9 +1827,11 @@ int spi_async_locked(struct spi_device *spi, struct spi_message *message) int ret; unsigned long flags; - ret = __spi_validate(spi, message); - if (ret != 0) - return ret; + if (!message->is_optimized) { + ret = __spi_validate(spi, message); + if (ret != 0) + return ret; + } spin_lock_irqsave(&master->bus_lock_spinlock, flags); diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 8c62ba7..8fa010d 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -506,6 +506,8 @@ extern struct spi_master *spi_busnum_to_master(u16 busnum); * @delay_usecs: microseconds to delay after this transfer before * (optionally) changing the chipselect status, then starting * the next transfer or completing this @spi_message. + * @optimize_hints: bitmask to define which fields are not modified + * after a call to spi_message_optimize * @transfer_list: transfers are sequenced through @spi_message.transfers * * SPI transfers always write the same number of bytes as they read. @@ -585,6 +587,15 @@ struct spi_transfer { u16 delay_usecs; u32 speed_hz; +#define SPI_OPTIMIZE_MAYCHANGE_TX_BUF (1<<0) +#define SPI_OPTIMIZE_MAYCHANGE_RX_BUF (1<<1) + /* these bits apply also to tx_dma/rx_dma if set */ +#define SPI_OPTIMIZE_MAYCHANGE_LEN (1<<2) +#define SPI_OPTIMIZE_MAYCHANGE_DELAY (1<<3) +#define SPI_OPTIMIZE_MAYCHANGE_SPEED (1<<4) +#define SPI_OPTIMIZE_MAYCHANGE_BITS (1<<5) + u32 optimize_hints; + struct list_head transfer_list; }; @@ -594,6 +605,10 @@ struct spi_transfer { * @spi: SPI device to which the transaction is queued * @is_dma_mapped: if true, the caller provided both dma and cpu virtual * addresses for each transfer buffer + * @is_optimized: if true, then the message has been initialized via + * spi_message_optimize - this is (re)set via spi_message_(un)optimize + * @optimize_data: data owned by the bus-driver after spi_message_optimize + * is called * @complete: called to report transaction completions * @context: the argument to complete() when it's called * @actual_length: the total number of bytes that were transferred in all @@ -623,6 +638,9 @@ struct spi_message { unsigned is_dma_mapped:1; + unsigned is_optimized:1; + void *optimize_data; + /* REVISIT: we might want a flag affecting the behavior of the * last transfer ... allowing things like "read 16 bit length L" * immediately followed by "read L bytes". Basically imposing @@ -667,6 +685,13 @@ spi_transfer_del(struct spi_transfer *t) list_del(&t->transfer_list); } +extern int spi_message_optimize(struct spi_device *spi, + struct spi_message *m, + gfp_t flags); + +extern void spi_message_unoptimize(struct spi_device *spi, + struct spi_message *m); + /** * spi_message_init_with_transfers - Initialize spi_message and append transfers * @m: spi_message to be initialized -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-11-22 6:53 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-11-19 18:33 [PATCH] added API: spi_message_compile/optimize/hint/ Martin Sperl [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> 2013-11-19 19:25 ` Mark Brown 2013-11-20 9:11 ` martin sperl 2013-11-20 11:29 ` Mark Brown [not found] ` <20131120112920.GX2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> 2013-11-20 15:27 ` Martin Sperl [not found] ` <05C64EF6-0F3B-4CBC-A462-45ED8F1564A7-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> 2013-11-20 16:24 ` Mark Brown [not found] ` <20131120162449.GJ2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> 2013-11-20 18:42 ` Martin Sperl [not found] ` <9DE09AF6-AD13-45DE-A36F-BD65E619C342-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org> 2013-11-21 19:10 ` Mark Brown [not found] ` <20131121191004.GH8120-GFdadSzt00ze9xe1eoZjHA@public.gmane.org> 2013-11-22 6:53 ` Martin Sperl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).