linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] added API: spi_message_compile/optimize/hint/...
@ 2013-11-19 18:33 Martin Sperl
       [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Sperl @ 2013-11-19 18:33 UTC (permalink / raw)
  To: linux-spi-u79uwXL29TY76Z2rM5mHXA; +Cc: Mark Brown

Hi!

Here the patch for a spi_message_compile interface 
(as discussed in a separate thread).

Naming of the API is subject to discussion - instead of "compile" it could
also be "optimize" or "hint" or ...

The patch creates:
* 3 additional variables in spi_message
* 2 additional function pointers in spi_master
* spi_message_compile/uncompile using the above
* __spi_async modified, so that the "validation" step for the message
  does not run every time.

Here some measurements using my "standard" CAN test:
* runs on Raspberry Pi without over-clocking
* optimized mcp2515 driver with 3 Message chunks and callbacks running 
  in listen only mode
* CAN bus is running at 125kHz (not 500 as before to simulate "less" load 
  - 25% to be exact)
* single can device retransmitting single frame repeatedly. (every 2.1s there
  is a gap of 15ms when no traffic happens - device reboots after watchdog
  timeout)
* spi-bcm2835 stock kernel driver is used with patch to run message pump with 
  realtime priority.
* kernel 3.10.16 (foundation based, as the vanilla RPI kernel still misses 
  dmaengine, so it is easier to develop against this kernel)

Main metrics reported are taken with "vmstat 10 12"

spi-bcm2835 realtime=0, unpatched spi.h, mcp2515 without any "compiles":
Messages received: 178523
Messages in second HW buffer: 20
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 411516  14380  26436    0    0   588    19 1340  653 12 31 37 20
 0  0      0 411232  14388  26436    0    0     0     4 45880 9412  0 35 61  4
 0  0      0 411104  14388  26436    0    0     0     0 45843 9378  0 34 66  0
 0  0      0 411040  14388  26436    0    0     0     8 45803 9528  0 41 59  0
 0  0      0 410816  14396  26436    0    0     0     6 45703 9394  0 38 53  9
 0  0      0 410624  14396  26436    0    0     0     0 45623 9270  0 33 67  0
 0  0      0 410560  14396  26436    0    0     0     0 45793 9335  0 34 66  0
 0  0      0 410464  14396  26436    0    0     0     3 45786 9367  0 35 65  0
 0  0      0 410464  14396  26436    0    0     0     0 45741 9364  0 35 65  0
 0  0      0 410496  14396  26436    0    0     0     0 45772 9360  0 35 65  0
 0  0      0 410464  14396  26436    0    0     0     0 45731 9309  0 34 66  0
 0  0      0 410528  14396  26436    0    0     0     0 45726 9287  0 34 66  0

spi-bcm2835 realtime=1 unpatched spi.h, mcp2515 without any "compiles":
Messages received: 178031
Messages in second HW buffer: 4
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 411512  14408  26440    0    0    57     2 7223 1497  1  8 89  2
 0  0      0 411384  14416  26440    0    0     0     2 45815 9827  0 34 61  5
 0  0      0 411288  14416  26440    0    0     0     0 45778 9822  0 33 67  0
 0  0      0 411128  14416  26440    0    0     0     2 45630 9807  0 35 65  0
 0  0      0 411064  14424  26440    0    0     0     6 45705 10029  0 40 54  6
 0  0      0 411032  14424  26440    0    0     0     0 45628 9807  0 33 67  0
 0  0      0 410840  14424  26440    0    0     0     0 45611 9763  0 33 67  0
 0  0      0 410744  14424  26440    0    0     0     0 45544 9716  0 33 67  0
 0  0      0 410712  14424  26440    0    0     0     0 45574 9756  0 33 67  0
 0  0      0 410712  14424  26440    0    0     0     0 45660 9807  0 33 67  0
 0  0      0 410712  14424  26440    0    0     0     0 45494 9768  0 33 67  0
 0  0      0 410712  14424  26440    0    0     0     0 45549 9778  0 34 66  0

spi-bcm2835 realtime=0 patched spi.h, mcp2515 without any "compiles":
Messages received: 178378
Messages in second HW buffer: 6
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 355064  20460  71900    0    0   182    20 8400 1786 15 17 57 11
 0  0      0 355180  20468  71900    0    0     0     2 45380 8788  0 33 59  8
 0  0      0 355116  20468  71900    0    0     0     0 46032 8863  0 33 67  0
 1  0      0 355020  20468  71900    0    0     0     2 45768 8865  0 36 64  0
 0  0      0 354924  20476  71900    0    0     0     6 45749 8869  0 39 51 10
 0  0      0 354828  20476  71900    0    0     0     0 45452 8747  0 33 67  0
 0  0      0 354732  20476  71900    0    0     0     0 45644 8757  0 33 67  0
 0  0      0 354540  20476  71900    0    0     0     0 45404 8746  0 33 67  0
 0  0      0 354508  20476  71900    0    0     0     0 45678 8739  0 32 68  0
 0  0      0 354548  20476  71900    0    0     0     0 45959 8804  0 33 67  0
 0  0      0 354516  20476  71900    0    0     0     0 45984 8831  0 33 67  0
 0  0      0 354548  20476  71900    0    0     0     0 45925 8835  0 34 66  0

spi-bcm2835 realtime=1 patched spi.h, mcp2515 without any "compiles":
Messages received: 177648
Messages in second HW buffer: 6
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 355456  20488  71908    0    0   116    13 11779 2372 10 15 68  7
 0  0      0 355296  20496  71908    0    0     0     2 45963 8916  0 35 61  3
 0  0      0 355168  20496  71908    0    0     0     0 45879 8913  0 35 65  0
 0  0      0 355104  20496  71908    0    0     0     2 45803 8880  0 36 64  0
 0  0      0 355008  20504  71908    0    0     0     8 45352 8888  0 41 48 11
 0  0      0 354880  20504  71908    0    0     0     0 45424 8846  0 36 64  0
 0  0      0 354816  20504  71908    0    0     0     0 45247 8804  0 34 66  0
 0  0      0 354720  20504  71908    0    0     0     0 45532 8824  0 35 65  0
 0  0      0 354688  20504  71908    0    0     0     0 45553 8826  0 36 64  0
 0  0      0 354720  20504  71908    0    0     0     0 45490 8834  0 34 66  0
 0  0      0 354688  20504  71908    0    0     0     0 45673 8880  0 35 65  0
 0  0      0 354688  20504  71908    0    0     0     0 45041 8674  0 34 66  0

spi-bcm2835 realtime=0 patched spi.h, mcp2515 with "compile":
Messages received: 178493
Messages in second HW buffer: 38
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 353868  20792  73116    0    0    74     9 15549 3084  8 16 71  5
 0  0      0 353804  20800  73116    0    0     0     2 45717 9051  0 32 63  5
 0  0      0 353676  20800  73116    0    0     0     0 45844 9044  0 33 67  0
 0  0      0 353620  20800  73116    0    0     0     1 45829 9120  0 36 64  0
 0  0      0 353588  20804  73116    0    0     0     2 45885 9164  0 37 58  5
 0  0      0 353460  20804  73116    0    0     0     0 45737 9070  0 32 68  0
 0  0      0 353364  20804  73116    0    0     0     0 45752 9056  0 31 69  0
 0  0      0 353204  20804  73116    0    0     0     0 45847 9078  0 33 67  0
 0  0      0 353140  20804  73116    0    0     0     0 45629 9035  0 33 67  0
 0  0      0 353108  20804  73116    0    0     0     0 45677 9052  0 32 68  0
 0  0      0 353140  20804  73116    0    0     0     0 45692 9025  0 33 67  0
 0  0      0 353172  20804  73116    0    0     0     0 45687 9023  0 33 67  0

spi-bcm2835 realtime=1 patched spi.h, mcp2515 with "compile":
Messages received: 177623
Messages in second HW buffer: 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 353964  20760  73112    0    0    87    10 13506 2687  9 15 70  6
 0  0      0 353836  20768  73112    0    0     0     2 45629 8960  0 34 63  4
 0  0      0 353740  20768  73112    0    0     0     0 45360 8920  0 33 67  0
 0  0      0 353676  20768  73112    0    0     0     2 45660 9033  0 35 65  0
 0  0      0 353580  20776  73112    0    0     0     6 45714 9007  0 36 57  6
 0  0      0 353452  20776  73112    0    0     0     0 45489 8897  0 32 68  0
 0  0      0 353292  20776  73112    0    0     0     0 45615 8976  0 33 67  0
 0  0      0 353164  20776  73112    0    0     0     0 45467 8942  0 34 66  0
 0  0      0 353164  20776  73112    0    0     0     0 45464 8933  0 32 68  0
 0  0      0 353196  20776  73112    0    0     0     0 45535 8937  0 33 67  0
 0  0      0 353196  20776  73112    0    0     0     0 45329 8916  0 33 67  0
 0  0      0 353228  20776  73112    0    0     0     0 45610 8953  0 33 67  0

So in summary:
* there is no regression in performance with the patch
* The improved locking does not show any real difference between 
  installations, but then we would nto be expecting an interrupt while running
  spi_async (at least not with this driver).
* there is a slight indication that System CPU values are slightly lower as we 
  have 31% and 32% System CPU that we have not seen in the tests prior to 
  applying the patch. This is thus related to the optimization of avoiding the
  repeated message checks in case of an optimized driver.

So this has a slightly positive impact on overall performance.
we could add a means to optimize all those spi_write/read/write_then_read
calls by compiling them. But that depends still on what we want to implement.

Assumptions by default should be:
* is_dma_mapped is set
* no change to structure, length, speed, bits, delays are allowed.

We may think of adding either flags for "global" changes like:
* allow length of transfer to change
* allow the dma_address of the transfer to change (so still with 
  is_dma_mapped)
* allow the address of the transfer to change (would mean overhead for 
  mapping dma on a per call basis)

>From an optimization perspective it might be better to give "hints" to what 
may change per spi_transfer, so adding a "u32 compile_hint" to spi_transfer
might be preferable to make it work best.

Please review and comment.

Thanks,
	Martin

P.s: The patch itself - it is against 3.10.16, so it might not apply cleanly...

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 32b7bb1..8f1aede 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1346,7 +1346,18 @@ int spi_setup(struct spi_device *spi)
 }
 EXPORT_SYMBOL_GPL(spi_setup);
 
-static int __spi_async(struct spi_device *spi, struct spi_message *message)
+/**
+ * spi_message_verify - verifies if the spi message is supported
+ *   by the bus-master
+ * @spi: device with which data will be exchanged
+ * @message: describes the data transfers, including completion callback
+ * Context: any (irqs may be blocked, etc)
+ *
+ * This call may be used in_irq and other contexts which can't sleep,
+ * as well as from task contexts which can sleep.
+ */
+extern int spi_message_verify(struct spi_device *spi,
+			struct spi_message *message)
 {
 	struct spi_master *master = spi->master;
 	struct spi_transfer *xfer;
@@ -1388,6 +1399,20 @@ static int __spi_async(struct spi_device *spi, struct spi_message *message)
 				return -EINVAL;
 		}
 	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(spi_message_verify);
+
+static int __spi_async(struct spi_device *spi, struct spi_message *message)
+{
+	struct spi_master *master = spi->master;
+	int ret;
+
+	if (!message->is_compiled) {
+		ret=spi_message_verify(spi,message);
+		if (ret)
+			return ret;
+	}
 
 	message->spi = spi;
 	message->status = -EINPROGRESS;
@@ -1430,15 +1455,12 @@ int spi_async(struct spi_device *spi, struct spi_message *message)
 	unsigned long flags;
 
 	spin_lock_irqsave(&master->bus_lock_spinlock, flags);
-
-	if (master->bus_lock_flag)
-		ret = -EBUSY;
-	else
-		ret = __spi_async(spi, message);
-
+	ret=master->bus_lock_flag;
 	spin_unlock_irqrestore(&master->bus_lock_spinlock, flags);
+	if (ret)
+		return -EBUSY;
 
-	return ret;
+	return __spi_async(spi, message);
 }
 EXPORT_SYMBOL_GPL(spi_async);
 
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index 6ff26c8..cea622f 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -378,6 +378,14 @@ struct spi_master {
 	int (*unprepare_transfer_hardware)(struct spi_master *master);
 	/* gpio chip select */
 	int			*cs_gpios;
+	/* compile (static) message handlers */
+	int (*message_compile)(struct spi_master *master,
+			struct spi_message *mesg,
+			u32 flags
+		);
+	void (*message_uncompile)(struct spi_master *master,
+				struct spi_message *mesg
+		);
 };
 
 static inline void *spi_master_get_devdata(struct spi_master *master)
@@ -532,6 +540,12 @@ struct spi_transfer {
  * @spi: SPI device to which the transaction is queued
  * @is_dma_mapped: if true, the caller provided both dma and cpu virtual
  *	addresses for each transfer buffer
+ * @is_compiled: if true, then the message has been initialized
+ *      via spi_compile_message
+ * @compiled_flags: flags given when the message got compiled
+ *      via spi_message_compile
+ * @compiled_data: data owned by bus-master driver between
+ *      spi_message_compile and spi_message_uncompile
  * @complete: called to report transaction completions
  * @context: the argument to complete() when it's called
  * @actual_length: the total number of bytes that were transferred in all
@@ -561,6 +575,13 @@ struct spi_message {
 
 	unsigned		is_dma_mapped:1;
 
+	/* bus-driver owned data for initialized messages
+	 * between spi_message_compile and spi_message_uncompile
+	 */
+	unsigned                is_compiled:1;
+	u32                     compiled_flags;
+	void                    *compiled_data;
+
 	/* REVISIT:  we might want a flag affecting the behavior of the
 	 * last transfer ... allowing things like "read 16 bit length L"
 	 * immediately followed by "read L bytes".  Basically imposing
@@ -604,6 +625,41 @@ spi_transfer_del(struct spi_transfer *t)
 	list_del(&t->transfer_list);
 }
 
+extern int spi_message_verify(struct spi_device *spi,
+			struct spi_message *m);
+
+static inline int spi_message_compile(struct spi_device *spi,
+				struct spi_message *m,
+				u32 flags)
+{
+	int verify;
+	if (m->is_compiled)
+		return -EOPNOTSUPP;
+
+	if ((verify=spi_message_verify(spi,m)))
+		return verify;
+
+	m->is_compiled=1;
+	m->compiled_flags=flags;
+	m->compiled_data=NULL;
+
+	if (spi->master->message_compile)
+		return spi->master->message_compile(spi->master,m,flags);
+	else
+		return 0;
+}
+
+static inline void spi_message_uncompile(struct spi_device *spi,
+					struct spi_message *m)
+{
+	if (spi->master->message_uncompile)
+		spi->master->message_uncompile(spi->master,m);
+
+	m->is_compiled=0;
+	m->compiled_flags=0;
+	m->compiled_data=NULL;
+}
+
 /**
  * spi_message_init_with_transfers - Initialize spi_message and append transfers
  * @m: spi_message to be initialized


--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] added API: spi_message_compile/optimize/hint/...
       [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
@ 2013-11-19 19:25   ` Mark Brown
  2013-11-20  9:11   ` martin sperl
  2013-11-20 11:29   ` Mark Brown
  2 siblings, 0 replies; 9+ messages in thread
From: Mark Brown @ 2013-11-19 19:25 UTC (permalink / raw)
  To: Martin Sperl; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 401 bytes --]

On Tue, Nov 19, 2013 at 07:33:14PM +0100, Martin Sperl wrote:

> Here the patch for a spi_message_compile interface 
> (as discussed in a separate thread).

Not looked at the patch yet (and I'm about to finish for today so it'll
be tomorrow) but it seems you forgot the Signed-off-by - I need that to
be able to apply if everything is OK.  See SubmittingPatches for details
of what it means.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] added API: spi_message_compile/optimize/hint/...
       [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
  2013-11-19 19:25   ` Mark Brown
@ 2013-11-20  9:11   ` martin sperl
  2013-11-20 11:29   ` Mark Brown
  2 siblings, 0 replies; 9+ messages in thread
From: martin sperl @ 2013-11-20  9:11 UTC (permalink / raw)
  To: linux-spi-u79uwXL29TY76Z2rM5mHXA; +Cc: Mark Brown

Here the missing sign-off:
Signed-Off-By: Martin Sperl <kernel-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>

Is this sufficient or do you need to send the patch again?

By the way: I am thinking of extending the spi_transfer
with another set of compiled_flags, so that individual
spi_transfers may get marked as "modifiable".

Adding also something like the following defines:
#define SPI_MESSAGE_COMPILE_MODIFIES_RXBUF    (1<<0)
#define SPI_MESSAGE_COMPILE_MODIFIES_TXBUF    (1<<1)
#define SPI_MESSAGE_COMPILE_MODIFIES_RXDMA    ((1<<2)|(1<<0))
#define SPI_MESSAGE_COMPILE_MODIFIES_TXDMA    ((1<<3)|(1<<1))
#define SPI_MESSAGE_COMPILE_MODIFIES_LEN      (1<<4)
#define SPI_MESSAGE_COMPILE_MODIFIES_DELAY    (1<<5)
#define SPI_MESSAGE_COMPILE_IN_ATOMIC_CONTEXT (1<<30)
#define SPI_MESSAGE_COMPILE_FLAGS_SET_IN_XFER (1<<31)

Note that SPI_MESSAGE_COMPILE_FLAGS_SET_IN XFER is there mainly to
allow copying the default flags from spi_message_compile to the
individual transfers if xfer.compile_flags=0. This copy would happen
in spi_message_verify (formerly in __spi_async).

adding the following  in the list_for_each_entry:
	if (!xfer->compile_flags)
		xfer->compile_flags = message->compiled_flags;

(but it would also require a slight change in spi_message_compile
to set message->compiled_flags before calling spi_message_verify.

That would allow spi_read, spi_write, spi_write_then_read to define
its "variation" in the structure and then compile the message.
Which would give a slight boost - and if I think of it it may make
things faster for a dma driver, as there could be some "shortcuts"...
for the different cases that could be made quite efficient if it
can rely on the above hints... (essentially a list with
copy from X to Y and maybe the required DMA mapping computation - if
necessary)

That way we at least avoid the memory allocation and the computation
of structures.

Please tell me your preferences.

Martin

On 19.11.2013 19:33, Martin Sperl wrote:
> Hi!
> 
> Here the patch for a spi_message_compile interface 
> (as discussed in a separate thread).
> 
> Naming of the API is subject to discussion - instead of "compile" it could
> also be "optimize" or "hint" or ...
> 
> The patch creates:
> * 3 additional variables in spi_message
> * 2 additional function pointers in spi_master
> * spi_message_compile/uncompile using the above
> * __spi_async modified, so that the "validation" step for the message
>   does not run every time.
> 
> Here some measurements using my "standard" CAN test:
> * runs on Raspberry Pi without over-clocking
> * optimized mcp2515 driver with 3 Message chunks and callbacks running 
>   in listen only mode
> * CAN bus is running at 125kHz (not 500 as before to simulate "less" load 
>   - 25% to be exact)
> * single can device retransmitting single frame repeatedly. (every 2.1s there
>   is a gap of 15ms when no traffic happens - device reboots after watchdog
>   timeout)
> * spi-bcm2835 stock kernel driver is used with patch to run message pump with 
>   realtime priority.
> * kernel 3.10.16 (foundation based, as the vanilla RPI kernel still misses 
>   dmaengine, so it is easier to develop against this kernel)
> 
> Main metrics reported are taken with "vmstat 10 12"
> 
> spi-bcm2835 realtime=0, unpatched spi.h, mcp2515 without any "compiles":
> Messages received: 178523
> Messages in second HW buffer: 20
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 411516  14380  26436    0    0   588    19 1340  653 12 31 37 20
>  0  0      0 411232  14388  26436    0    0     0     4 45880 9412  0 35 61  4
>  0  0      0 411104  14388  26436    0    0     0     0 45843 9378  0 34 66  0
>  0  0      0 411040  14388  26436    0    0     0     8 45803 9528  0 41 59  0
>  0  0      0 410816  14396  26436    0    0     0     6 45703 9394  0 38 53  9
>  0  0      0 410624  14396  26436    0    0     0     0 45623 9270  0 33 67  0
>  0  0      0 410560  14396  26436    0    0     0     0 45793 9335  0 34 66  0
>  0  0      0 410464  14396  26436    0    0     0     3 45786 9367  0 35 65  0
>  0  0      0 410464  14396  26436    0    0     0     0 45741 9364  0 35 65  0
>  0  0      0 410496  14396  26436    0    0     0     0 45772 9360  0 35 65  0
>  0  0      0 410464  14396  26436    0    0     0     0 45731 9309  0 34 66  0
>  0  0      0 410528  14396  26436    0    0     0     0 45726 9287  0 34 66  0
> 
> spi-bcm2835 realtime=1 unpatched spi.h, mcp2515 without any "compiles":
> Messages received: 178031
> Messages in second HW buffer: 4
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 411512  14408  26440    0    0    57     2 7223 1497  1  8 89  2
>  0  0      0 411384  14416  26440    0    0     0     2 45815 9827  0 34 61  5
>  0  0      0 411288  14416  26440    0    0     0     0 45778 9822  0 33 67  0
>  0  0      0 411128  14416  26440    0    0     0     2 45630 9807  0 35 65  0
>  0  0      0 411064  14424  26440    0    0     0     6 45705 10029  0 40 54  6
>  0  0      0 411032  14424  26440    0    0     0     0 45628 9807  0 33 67  0
>  0  0      0 410840  14424  26440    0    0     0     0 45611 9763  0 33 67  0
>  0  0      0 410744  14424  26440    0    0     0     0 45544 9716  0 33 67  0
>  0  0      0 410712  14424  26440    0    0     0     0 45574 9756  0 33 67  0
>  0  0      0 410712  14424  26440    0    0     0     0 45660 9807  0 33 67  0
>  0  0      0 410712  14424  26440    0    0     0     0 45494 9768  0 33 67  0
>  0  0      0 410712  14424  26440    0    0     0     0 45549 9778  0 34 66  0
> 
> spi-bcm2835 realtime=0 patched spi.h, mcp2515 without any "compiles":
> Messages received: 178378
> Messages in second HW buffer: 6
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 355064  20460  71900    0    0   182    20 8400 1786 15 17 57 11
>  0  0      0 355180  20468  71900    0    0     0     2 45380 8788  0 33 59  8
>  0  0      0 355116  20468  71900    0    0     0     0 46032 8863  0 33 67  0
>  1  0      0 355020  20468  71900    0    0     0     2 45768 8865  0 36 64  0
>  0  0      0 354924  20476  71900    0    0     0     6 45749 8869  0 39 51 10
>  0  0      0 354828  20476  71900    0    0     0     0 45452 8747  0 33 67  0
>  0  0      0 354732  20476  71900    0    0     0     0 45644 8757  0 33 67  0
>  0  0      0 354540  20476  71900    0    0     0     0 45404 8746  0 33 67  0
>  0  0      0 354508  20476  71900    0    0     0     0 45678 8739  0 32 68  0
>  0  0      0 354548  20476  71900    0    0     0     0 45959 8804  0 33 67  0
>  0  0      0 354516  20476  71900    0    0     0     0 45984 8831  0 33 67  0
>  0  0      0 354548  20476  71900    0    0     0     0 45925 8835  0 34 66  0
> 
> spi-bcm2835 realtime=1 patched spi.h, mcp2515 without any "compiles":
> Messages received: 177648
> Messages in second HW buffer: 6
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 355456  20488  71908    0    0   116    13 11779 2372 10 15 68  7
>  0  0      0 355296  20496  71908    0    0     0     2 45963 8916  0 35 61  3
>  0  0      0 355168  20496  71908    0    0     0     0 45879 8913  0 35 65  0
>  0  0      0 355104  20496  71908    0    0     0     2 45803 8880  0 36 64  0
>  0  0      0 355008  20504  71908    0    0     0     8 45352 8888  0 41 48 11
>  0  0      0 354880  20504  71908    0    0     0     0 45424 8846  0 36 64  0
>  0  0      0 354816  20504  71908    0    0     0     0 45247 8804  0 34 66  0
>  0  0      0 354720  20504  71908    0    0     0     0 45532 8824  0 35 65  0
>  0  0      0 354688  20504  71908    0    0     0     0 45553 8826  0 36 64  0
>  0  0      0 354720  20504  71908    0    0     0     0 45490 8834  0 34 66  0
>  0  0      0 354688  20504  71908    0    0     0     0 45673 8880  0 35 65  0
>  0  0      0 354688  20504  71908    0    0     0     0 45041 8674  0 34 66  0
> 
> spi-bcm2835 realtime=0 patched spi.h, mcp2515 with "compile":
> Messages received: 178493
> Messages in second HW buffer: 38
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 353868  20792  73116    0    0    74     9 15549 3084  8 16 71  5
>  0  0      0 353804  20800  73116    0    0     0     2 45717 9051  0 32 63  5
>  0  0      0 353676  20800  73116    0    0     0     0 45844 9044  0 33 67  0
>  0  0      0 353620  20800  73116    0    0     0     1 45829 9120  0 36 64  0
>  0  0      0 353588  20804  73116    0    0     0     2 45885 9164  0 37 58  5
>  0  0      0 353460  20804  73116    0    0     0     0 45737 9070  0 32 68  0
>  0  0      0 353364  20804  73116    0    0     0     0 45752 9056  0 31 69  0
>  0  0      0 353204  20804  73116    0    0     0     0 45847 9078  0 33 67  0
>  0  0      0 353140  20804  73116    0    0     0     0 45629 9035  0 33 67  0
>  0  0      0 353108  20804  73116    0    0     0     0 45677 9052  0 32 68  0
>  0  0      0 353140  20804  73116    0    0     0     0 45692 9025  0 33 67  0
>  0  0      0 353172  20804  73116    0    0     0     0 45687 9023  0 33 67  0
> 
> spi-bcm2835 realtime=1 patched spi.h, mcp2515 with "compile":
> Messages received: 177623
> Messages in second HW buffer: 3
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  0  0      0 353964  20760  73112    0    0    87    10 13506 2687  9 15 70  6
>  0  0      0 353836  20768  73112    0    0     0     2 45629 8960  0 34 63  4
>  0  0      0 353740  20768  73112    0    0     0     0 45360 8920  0 33 67  0
>  0  0      0 353676  20768  73112    0    0     0     2 45660 9033  0 35 65  0
>  0  0      0 353580  20776  73112    0    0     0     6 45714 9007  0 36 57  6
>  0  0      0 353452  20776  73112    0    0     0     0 45489 8897  0 32 68  0
>  0  0      0 353292  20776  73112    0    0     0     0 45615 8976  0 33 67  0
>  0  0      0 353164  20776  73112    0    0     0     0 45467 8942  0 34 66  0
>  0  0      0 353164  20776  73112    0    0     0     0 45464 8933  0 32 68  0
>  0  0      0 353196  20776  73112    0    0     0     0 45535 8937  0 33 67  0
>  0  0      0 353196  20776  73112    0    0     0     0 45329 8916  0 33 67  0
>  0  0      0 353228  20776  73112    0    0     0     0 45610 8953  0 33 67  0
> 
> So in summary:
> * there is no regression in performance with the patch
> * The improved locking does not show any real difference between 
>   installations, but then we would nto be expecting an interrupt while running
>   spi_async (at least not with this driver).
> * there is a slight indication that System CPU values are slightly lower as we 
>   have 31% and 32% System CPU that we have not seen in the tests prior to 
>   applying the patch. This is thus related to the optimization of avoiding the
>   repeated message checks in case of an optimized driver.
> 
> So this has a slightly positive impact on overall performance.
> we could add a means to optimize all those spi_write/read/write_then_read
> calls by compiling them. But that depends still on what we want to implement.
> 
> Assumptions by default should be:
> * is_dma_mapped is set
> * no change to structure, length, speed, bits, delays are allowed.
> 
> We may think of adding either flags for "global" changes like:
> * allow length of transfer to change
> * allow the dma_address of the transfer to change (so still with 
>   is_dma_mapped)
> * allow the address of the transfer to change (would mean overhead for 
>   mapping dma on a per call basis)
> 
> From an optimization perspective it might be better to give "hints" to what 
> may change per spi_transfer, so adding a "u32 compile_hint" to spi_transfer
> might be preferable to make it work best.
> 
> Please review and comment.
> 
> Thanks,
> 	Martin
> 
> P.s: The patch itself - it is against 3.10.16, so it might not apply cleanly...
> 
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 32b7bb1..8f1aede 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -1346,7 +1346,18 @@ int spi_setup(struct spi_device *spi)
>  }
>  EXPORT_SYMBOL_GPL(spi_setup);
>  
> -static int __spi_async(struct spi_device *spi, struct spi_message *message)
> +/**
> + * spi_message_verify - verifies if the spi message is supported
> + *   by the bus-master
> + * @spi: device with which data will be exchanged
> + * @message: describes the data transfers, including completion callback
> + * Context: any (irqs may be blocked, etc)
> + *
> + * This call may be used in_irq and other contexts which can't sleep,
> + * as well as from task contexts which can sleep.
> + */
> +extern int spi_message_verify(struct spi_device *spi,
> +			struct spi_message *message)
>  {
>  	struct spi_master *master = spi->master;
>  	struct spi_transfer *xfer;
> @@ -1388,6 +1399,20 @@ static int __spi_async(struct spi_device *spi, struct spi_message *message)
>  				return -EINVAL;
>  		}
>  	}
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(spi_message_verify);
> +
> +static int __spi_async(struct spi_device *spi, struct spi_message *message)
> +{
> +	struct spi_master *master = spi->master;
> +	int ret;
> +
> +	if (!message->is_compiled) {
> +		ret=spi_message_verify(spi,message);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	message->spi = spi;
>  	message->status = -EINPROGRESS;
> @@ -1430,15 +1455,12 @@ int spi_async(struct spi_device *spi, struct spi_message *message)
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&master->bus_lock_spinlock, flags);
> -
> -	if (master->bus_lock_flag)
> -		ret = -EBUSY;
> -	else
> -		ret = __spi_async(spi, message);
> -
> +	ret=master->bus_lock_flag;
>  	spin_unlock_irqrestore(&master->bus_lock_spinlock, flags);
> +	if (ret)
> +		return -EBUSY;
>  
> -	return ret;
> +	return __spi_async(spi, message);
>  }
>  EXPORT_SYMBOL_GPL(spi_async);
>  
> diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
> index 6ff26c8..cea622f 100644
> --- a/include/linux/spi/spi.h
> +++ b/include/linux/spi/spi.h
> @@ -378,6 +378,14 @@ struct spi_master {
>  	int (*unprepare_transfer_hardware)(struct spi_master *master);
>  	/* gpio chip select */
>  	int			*cs_gpios;
> +	/* compile (static) message handlers */
> +	int (*message_compile)(struct spi_master *master,
> +			struct spi_message *mesg,
> +			u32 flags
> +		);
> +	void (*message_uncompile)(struct spi_master *master,
> +				struct spi_message *mesg
> +		);
>  };
>  
>  static inline void *spi_master_get_devdata(struct spi_master *master)
> @@ -532,6 +540,12 @@ struct spi_transfer {
>   * @spi: SPI device to which the transaction is queued
>   * @is_dma_mapped: if true, the caller provided both dma and cpu virtual
>   *	addresses for each transfer buffer
> + * @is_compiled: if true, then the message has been initialized
> + *      via spi_compile_message
> + * @compiled_flags: flags given when the message got compiled
> + *      via spi_message_compile
> + * @compiled_data: data owned by bus-master driver between
> + *      spi_message_compile and spi_message_uncompile
>   * @complete: called to report transaction completions
>   * @context: the argument to complete() when it's called
>   * @actual_length: the total number of bytes that were transferred in all
> @@ -561,6 +575,13 @@ struct spi_message {
>  
>  	unsigned		is_dma_mapped:1;
>  
> +	/* bus-driver owned data for initialized messages
> +	 * between spi_message_compile and spi_message_uncompile
> +	 */
> +	unsigned                is_compiled:1;
> +	u32                     compiled_flags;
> +	void                    *compiled_data;
> +
>  	/* REVISIT:  we might want a flag affecting the behavior of the
>  	 * last transfer ... allowing things like "read 16 bit length L"
>  	 * immediately followed by "read L bytes".  Basically imposing
> @@ -604,6 +625,41 @@ spi_transfer_del(struct spi_transfer *t)
>  	list_del(&t->transfer_list);
>  }
>  
> +extern int spi_message_verify(struct spi_device *spi,
> +			struct spi_message *m);
> +
> +static inline int spi_message_compile(struct spi_device *spi,
> +				struct spi_message *m,
> +				u32 flags)
> +{
> +	int verify;
> +	if (m->is_compiled)
> +		return -EOPNOTSUPP;
> +
> +	if ((verify=spi_message_verify(spi,m)))
> +		return verify;
> +
> +	m->is_compiled=1;
> +	m->compiled_flags=flags;
> +	m->compiled_data=NULL;
> +
> +	if (spi->master->message_compile)
> +		return spi->master->message_compile(spi->master,m,flags);
> +	else
> +		return 0;
> +}
> +
> +static inline void spi_message_uncompile(struct spi_device *spi,
> +					struct spi_message *m)
> +{
> +	if (spi->master->message_uncompile)
> +		spi->master->message_uncompile(spi->master,m);
> +
> +	m->is_compiled=0;
> +	m->compiled_flags=0;
> +	m->compiled_data=NULL;
> +}
> +
>  /**
>   * spi_message_init_with_transfers - Initialize spi_message and append transfers
>   * @m: spi_message to be initialized
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] added API: spi_message_compile/optimize/hint/...
       [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
  2013-11-19 19:25   ` Mark Brown
  2013-11-20  9:11   ` martin sperl
@ 2013-11-20 11:29   ` Mark Brown
       [not found]     ` <20131120112920.GX2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
  2 siblings, 1 reply; 9+ messages in thread
From: Mark Brown @ 2013-11-20 11:29 UTC (permalink / raw)
  To: Martin Sperl; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]

On Tue, Nov 19, 2013 at 07:33:14PM +0100, Martin Sperl wrote:

> Here the patch for a spi_message_compile interface 
> (as discussed in a separate thread).

As well as the signoff please try to follow the other stuff in
SubmittingPatches, both in terms of the formatting of the patch and in
terms of things like coding style - checkpatch is a useful tool for
checking this stuff.

The shape of the patch looks good, but there's a few issues below.

> Naming of the API is subject to discussion - instead of "compile" it could
> also be "optimize" or "hint" or ...

Hrm.  I don't really like any of these options but can't think of
anything else right now.  Perhaps optimize is the best of the options
here.

> The patch creates:
> * 3 additional variables in spi_message
> * 2 additional function pointers in spi_master

Can you keep the master interface in a separate patch please?  I'm not
sure if we want to do this or if we want to push more of the work to the
core, especially given the desire to factor the generic bits of DMA work
out of drivers, but the external interface seems very clear now so we
should be able to get that merged more easily (and then things like the
mcp2515 can start building on that).

For the master interface I think I'd like to at least see a patch adding
a driver using it go in along with the core changes.  This is probably
going to be a bit difficult right now given the situation with the
Raspberry Pi kernels unfortunately.

> Here some measurements using my "standard" CAN test:
> * runs on Raspberry Pi without over-clocking
> * optimized mcp2515 driver with 3 Message chunks and callbacks running 
>   in listen only mode

Is this device part of the vanilla board out of interest?

> * kernel 3.10.16 (foundation based, as the vanilla RPI kernel still misses 
>   dmaengine, so it is easier to develop against this kernel)

Patches really need to be sent against the latest code for application -
this is going to collide with the factoring of the validation out of the
lock that I did too.  Normally that should be 

  git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

though until the merge window is over that'll need to be topic/core.

> +/**
> + * spi_message_verify - verifies if the spi message is supported
> + *   by the bus-master
> + * @spi: device with which data will be exchanged
> + * @message: describes the data transfers, including completion callback
> + * Context: any (irqs may be blocked, etc)
> + *
> + * This call may be used in_irq and other contexts which can't sleep,
> + * as well as from task contexts which can sleep.
> + */
> +extern int spi_message_verify(struct spi_device *spi,
> +			struct spi_message *message)

This stuff does more than verify, it also initialises the message with
defaults - if it were really a verify() then it'd be able to have the
message passed as a const.

> +	if ((verify=spi_message_verify(spi,m)))
> +		return verify;

Assign and check the result separately.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] added API: spi_message_compile/optimize/hint/...
       [not found]     ` <20131120112920.GX2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
@ 2013-11-20 15:27       ` Martin Sperl
       [not found]         ` <05C64EF6-0F3B-4CBC-A462-45ED8F1564A7-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Sperl @ 2013-11-20 15:27 UTC (permalink / raw)
  To: Mark Brown; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA

OK - I will supply it as 2 patches.

So it is spi_message_optimize - unless you can think of something different.

As mentioned before I will take the opportunity to move the flags into
spi_transfer for optimization hints, because there is where the change 
really happens...

The current bits come to my mind:
#define SPI_OPTIMIZE_FIXED_TX_BUF	(1<<0)
#define SPI_OPTIMIZE_FIXED_TX_DMA	((1<<1)|(1<<0))
#define SPI_OPTIMIZE_FIXED_RX_BUF	(1<<2)
#define SPI_OPTIMIZE_FIXED_RX_DMA	((1<<3)|(1<<2))
#define SPI_OPTIMIZE_FIXED_LEN		(1<<4)
#define SPI_OPTIMIZE_FIXED_DELAY	(1<<5)
#define SPI_OPTIMIZE_FIXED_SPEED	(1<<6)
#define SPI_OPTIMIZE_FIXED_BITS		(1<<7)
#define SPI_OPTIMIZE_FIXED_ALL		(-1)

The flags argument now is of type gfp_t to allow for memory allocation
policies.

> 
> Is this device part of the vanilla board out of interest?
> 
no, unfortunately not, but you can get an extension board for 20$.

>> * kernel 3.10.16 (foundation based, as the vanilla RPI kernel still misses 
>>  dmaengine, so it is easier to develop against this kernel)
> 
> Patches really need to be sent against the latest code for application -
> this is going to collide with the factoring of the validation out of the
> lock that I did too.  Normally that should be 
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next
> 
> though until the merge window is over that'll need to be topic/core.
> 

I will provide the patches as soon as I have pulled your tree - only
problem is that I can not really compile it on this tree on my RPI...

As you have said: it is a bit complicated with the RPI board+kernel right
now...

I got a separate board for these exercises in the future, but without 
dma-engine or USB/networking it is a bit tough to develop and debug a 
system over the serial line...

> 
> This stuff does more than verify, it also initialises the message with
> defaults - if it were really a verify() then it'd be able to have the
> message passed as a const.
For now I will call it: spi_message_verify_and_prepare.

The issue is that we also need this when optimizing the message - 
otherwise I would have left it where it is...

Also I started to think of how I can optimize the DMA-chain generation
by creating "dma_fragments" that I just need to join together and which are
cached by the bus driver (assuming that we have similar usage all the way,
so that the cached objects are repeatedly used.
This is still work in progress, but it should bring down CPU utilization
from 45% hopefully to the same range as the current interrupt driven driver.

This might then get pushed into the direction of DMA-eninge to become more
generic.

The caches that I can think of right now are:
	dma_fragment_cache_set_cdiv
        dma_fragment_cache_init_transfer
        dma_fragment_cache_transfer_txrx
        dma_fragment_cache_cs_change
        dma_fragment_cache_delay
        dma_fragment_cache_end_of_message
These would match the "fragments" that I can think of.
OK, we could add cdiv into init_transfer, but changing bus-speeds is not 
very common - I would assume, so it would just add a delay...

Each of those unfortunately would require a separate create/free function
to populate the caches. So generalizing this into spi_master would blow up
this structure... But then these cache structures would need to sit 
somewhere anyway, so adding them to the master would need to happen
one way or another... But then maybe there are different ideas for something
like this (but I assume a driver making use of this would be required first)

Ciao,
	Martin--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] added API: spi_message_compile/optimize/hint/...
       [not found]         ` <05C64EF6-0F3B-4CBC-A462-45ED8F1564A7-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
@ 2013-11-20 16:24           ` Mark Brown
       [not found]             ` <20131120162449.GJ2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Brown @ 2013-11-20 16:24 UTC (permalink / raw)
  To: Martin Sperl; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1913 bytes --]

On Wed, Nov 20, 2013 at 04:27:06PM +0100, Martin Sperl wrote:

> As mentioned before I will take the opportunity to move the flags into
> spi_transfer for optimization hints, because there is where the change 
> really happens...

These are set by the caller?

> The current bits come to my mind:
> #define SPI_OPTIMIZE_FIXED_TX_BUF	(1<<0)
> #define SPI_OPTIMIZE_FIXED_TX_DMA	((1<<1)|(1<<0))
> #define SPI_OPTIMIZE_FIXED_RX_BUF	(1<<2)
> #define SPI_OPTIMIZE_FIXED_RX_DMA	((1<<3)|(1<<2))
> #define SPI_OPTIMIZE_FIXED_LEN		(1<<4)
> #define SPI_OPTIMIZE_FIXED_DELAY	(1<<5)
> #define SPI_OPTIMIZE_FIXED_SPEED	(1<<6)
> #define SPI_OPTIMIZE_FIXED_BITS		(1<<7)
> #define SPI_OPTIMIZE_FIXED_ALL		(-1)

> The flags argument now is of type gfp_t to allow for memory allocation
> policies.

I'd be tempted to start with assuming everything is fixed and then add
flags to mark things as variable rather than the other way around since
that's the most conservative option and should permit the greatest level
of optimisation.

> I will provide the patches as soon as I have pulled your tree - only
> problem is that I can not really compile it on this tree on my RPI...

You should at least be able to do build tests, it's OK if you can't do
runtime tests - I can do validation myself, as can other people working
with -next.

> Each of those unfortunately would require a separate create/free function
> to populate the caches. So generalizing this into spi_master would blow up
> this structure... But then these cache structures would need to sit 
> somewhere anyway, so adding them to the master would need to happen
> one way or another... But then maybe there are different ideas for something
> like this (but I assume a driver making use of this would be required first)

Yes, there's some tricky issues here...  let's have a think about it and
try to do this incrementally.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] added API: spi_message_compile/optimize/hint/...
       [not found]             ` <20131120162449.GJ2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
@ 2013-11-20 18:42               ` Martin Sperl
       [not found]                 ` <9DE09AF6-AD13-45DE-A36F-BD65E619C342-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Sperl @ 2013-11-20 18:42 UTC (permalink / raw)
  To: Mark Brown; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA


>> As mentioned before I will take the opportunity to move the flags into
>> spi_transfer for optimization hints, because there is where the change 
>> really happens...
> 
> These are set by the caller?

yes - well similar to change_cs, delay_us,...

Patch looks like this:
@@ -585,6 +587,17 @@ struct spi_transfer {
        u16             delay_usecs;
        u32             speed_hz;
 
+#define SPI_OPTIMIZE_FIXED_TX_BUF              (1<<0)
+#define SPI_OPTIMIZE_FIXED_TX_DMA              ((1<<1)|(1<<0))
+#define SPI_OPTIMIZE_FIXED_RX_BUF              (1<<2)
+#define SPI_OPTIMIZE_FIXED_RX_DMA              ((1<<3)|(1<<2))
+#define SPI_OPTIMIZE_FIXED_LEN                 (1<<4)
+#define SPI_OPTIMIZE_FIXED_DELAY               (1<<5)
+#define SPI_OPTIMIZE_FIXED_SPEED               (1<<6)
+#define SPI_OPTIMIZE_FIXED_BITS                        (1<<7)
+#define SPI_OPTIMIZE_FIXED_ALL                 (-1)
+       u32             optimize_hints;
+
        struct list_head transfer_list;
 };


> 
> I'd be tempted to start with assuming everything is fixed and then add
> flags to mark things as variable rather than the other way around since
> that's the most conservative option and should permit the greatest level
> of optimisation.
> 

I thought the same at some point - it is just a matter of naming them.

One argument for explicitly naming what IS fixed, is the fact that
this way the default 0 maps to the current behavior, where you may not 
make any "assumptions".

This way you could also optimize transparently (without an explicit call),
but doing the optimization transparently means opening up a can of worms:
* when to release the memory that was "optimized"? Do we really want to 
  add garbage collection to the kernel?
* such transparent "optimization" would also occur during an interrupt
  and then it would need to allocate with GFP_ATOMIC
* also time spent in interrupt would be minimal - adding a thread for 
  that ads complexity

that is why the spi_message_optimize contains also a gfp_t flag to 
allocate from the correct pool...

It is in the end a matter of taste, so tell me which way you prefer the
solution:
*  0 = everything variable (so the zkalloced memory is set like this)
* -1 = everything variable 

>> I will provide the patches as soon as I have pulled your tree - only
>> problem is that I can not really compile it on this tree on my RPI...
> 
> You should at least be able to do build tests, it's OK if you can't do
> runtime tests - I can do validation myself, as can other people working
> with -next.

I right now started to build a kernel from your sources, but it compiles 
slowly on a raspberry-pi... (also the default .config is not complete and
a lot of things are left outside, like can support as module,... - I wish
someone would maintain a better .config I could use instead of starting 
with make bcm2835_defconfig)

> Yes, there's some tricky issues here...  let's have a think about it and
> try to do this incrementally.
we are on the same page on this!

First get the driver really working with the Foundation kernel and maybe
by that time we have the RPI-dmaengine also in the kernel...

Martin

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] added API: spi_message_compile/optimize/hint/...
       [not found]                 ` <9DE09AF6-AD13-45DE-A36F-BD65E619C342-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
@ 2013-11-21 19:10                   ` Mark Brown
       [not found]                     ` <20131121191004.GH8120-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Brown @ 2013-11-21 19:10 UTC (permalink / raw)
  To: Martin Sperl; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

On Wed, Nov 20, 2013 at 07:42:44PM +0100, Martin Sperl wrote:

> One argument for explicitly naming what IS fixed, is the fact that
> this way the default 0 maps to the current behavior, where you may not 
> make any "assumptions".

Sure, but then this only applies if the client explicitly calls a
function to optimise things.  Anything done transparently behind clients
should definitely set all the flags but things that are actively
engaging with optimisation are different and it's that I'm thinking
about here.  Otherwise what happens is that if we add a new variable
(like we did when we added support for dual and quad mode recently) we
end up having to go round every client driver updating them to say that
this new thing they never heard of isn't going to change which doesn't
seem like winning.

> I right now started to build a kernel from your sources, but it compiles 
> slowly on a raspberry-pi... (also the default .config is not complete and
> a lot of things are left outside, like can support as module,... - I wish
> someone would maintain a better .config I could use instead of starting 
> with make bcm2835_defconfig)

Cross building FTW :)  If you can't test anyway then build testing on
x86 is fine.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] added API: spi_message_compile/optimize/hint/...
       [not found]                     ` <20131121191004.GH8120-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
@ 2013-11-22  6:53                       ` Martin Sperl
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Sperl @ 2013-11-22  6:53 UTC (permalink / raw)
  To: Mark Brown; +Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA

Hi Mark!

Here the new version of the patch that now:
* applies against topic/core (and also got compiled)
* has the bus-driver-specific stuff left out (but comments where this
  would go)
* Driver hints are now only in spi_transfer 
* the optimize_hint bitfield is now a field of things that may change
  (as discussed)

See also a comment in code about the possible "generic" use of optimize:

In principle this code could also generically set up DMA mapping - especially
for platforms that do not have to do a bounce-buffer that they need to set up
and copy to/from before/after DMA. Also just allocating a bounce buffer just
once may also be possible optimization to reduce overhead.

This would make it easier to avoid the "unloved" dma interface (rx_dma/tx_dma)
for driver developers - they would just have to call spi_message_optimize
and stop caring about DMA-able addresses - it would be taken care of...

We could also make flags:
#define SPI_OPTIMIZE_ALLOCATE_RXBUFFER		(1<<6)
#define SPI_OPTIMIZE_ALLOCATE_TXBUFFER		(1<<7)
which would allocate the rx/tx buffers for the driver with the positive
side-effect that there is no more messing with dma_masks and such and
would avoid the requirement of mapping/unmapping before/after each DMA
transaction.
(would need to happen in the spi_validate function to be generic enough
and may require some capability hints from the dma framework of what it 
can support).

Martin

Signed-Off-By: Martin Sperl <kernel-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index e40b236..6d1d773 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1680,6 +1680,53 @@ static int __spi_validate(struct spi_device *spi, struct spi_message *message)
 	return 0;
 }
 
+/**
+ * spi_message_optimize - analysis of the spi message
+ *	optimizing it for later use
+ * @spi: device with which data will be exchanged
+ * @message: describes the data transfers, including completion
+ *	callback and optimization hints
+ * Context: can sleep
+ *
+ * This call optimizes the message for later efficient use.
+ * this may mean preparing DMA pattern for the bus in question.
+ *
+ * Between the optimize and unoptimize call the structure
+ * of the message may not get changed in any way
+ * unless explicitly stated in optimize_hints in the
+ * spi_transfer structure, which states explicitly which parts
+ * may change, so that the driver may make optimizations
+ * in handling these messages.
+ *
+ * For example: this may set up DMA mapping early without
+ * the driver having to provide rx_dma/tx_dma itself.
+ */
+
+int spi_message_optimize(struct spi_device *spi,
+			struct spi_message *m,
+			gfp_t flags)
+{
+	int ret;
+
+	if (m->is_optimized)
+		return -EOPNOTSUPP;
+
+	ret=__spi_validate(spi,m);
+	if (!ret) {
+		/* bus-driver specific call goes here */
+		m->is_optimized=1;
+	}
+
+	return ret;
+}
+
+void spi_message_unoptimize(struct spi_device *spi,
+			struct spi_message *m)
+{
+	/* bus-driver specific call goes here */
+	m->is_optimized=0;
+}
+
 static int __spi_async(struct spi_device *spi, struct spi_message *message)
 {
 	struct spi_master *master = spi->master;
@@ -1726,9 +1773,11 @@ int spi_async(struct spi_device *spi, struct spi_message *message)
 	int ret;
 	unsigned long flags;
 
-	ret = __spi_validate(spi, message);
-	if (ret != 0)
-		return ret;
+	if (!message->is_optimized) {
+		ret = __spi_validate(spi, message);
+		if (ret != 0)
+			return ret;
+	}
 
 	spin_lock_irqsave(&master->bus_lock_spinlock, flags);
 
@@ -1778,9 +1827,11 @@ int spi_async_locked(struct spi_device *spi, struct spi_message *message)
 	int ret;
 	unsigned long flags;
 
-	ret = __spi_validate(spi, message);
-	if (ret != 0)
-		return ret;
+	if (!message->is_optimized) {
+		ret = __spi_validate(spi, message);
+		if (ret != 0)
+			return ret;
+	}
 
 	spin_lock_irqsave(&master->bus_lock_spinlock, flags);
 
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index 8c62ba7..8fa010d 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -506,6 +506,8 @@ extern struct spi_master *spi_busnum_to_master(u16 busnum);
  * @delay_usecs: microseconds to delay after this transfer before
  *	(optionally) changing the chipselect status, then starting
  *	the next transfer or completing this @spi_message.
+ * @optimize_hints: bitmask to define which fields are not modified
+ *      after a call to spi_message_optimize
  * @transfer_list: transfers are sequenced through @spi_message.transfers
  *
  * SPI transfers always write the same number of bytes as they read.
@@ -585,6 +587,15 @@ struct spi_transfer {
 	u16		delay_usecs;
 	u32		speed_hz;
 
+#define SPI_OPTIMIZE_MAYCHANGE_TX_BUF		(1<<0)
+#define SPI_OPTIMIZE_MAYCHANGE_RX_BUF		(1<<1)
+	/* these bits apply also to tx_dma/rx_dma if set */
+#define SPI_OPTIMIZE_MAYCHANGE_LEN	       	(1<<2)
+#define SPI_OPTIMIZE_MAYCHANGE_DELAY		(1<<3)
+#define SPI_OPTIMIZE_MAYCHANGE_SPEED		(1<<4)
+#define SPI_OPTIMIZE_MAYCHANGE_BITS		(1<<5)
+	u32	       	optimize_hints;
+
 	struct list_head transfer_list;
 };
 
@@ -594,6 +605,10 @@ struct spi_transfer {
  * @spi: SPI device to which the transaction is queued
  * @is_dma_mapped: if true, the caller provided both dma and cpu virtual
  *	addresses for each transfer buffer
+ * @is_optimized: if true, then the message has been initialized via
+ *	spi_message_optimize - this is (re)set via spi_message_(un)optimize
+ * @optimize_data: data owned by the bus-driver after spi_message_optimize
+ *	is called
  * @complete: called to report transaction completions
  * @context: the argument to complete() when it's called
  * @actual_length: the total number of bytes that were transferred in all
@@ -623,6 +638,9 @@ struct spi_message {
 
 	unsigned		is_dma_mapped:1;
 
+	unsigned		is_optimized:1;
+	void			*optimize_data;
+
 	/* REVISIT:  we might want a flag affecting the behavior of the
 	 * last transfer ... allowing things like "read 16 bit length L"
 	 * immediately followed by "read L bytes".  Basically imposing
@@ -667,6 +685,13 @@ spi_transfer_del(struct spi_transfer *t)
 	list_del(&t->transfer_list);
 }
 
+extern int spi_message_optimize(struct spi_device *spi,
+				struct spi_message *m,
+				gfp_t flags);
+
+extern void spi_message_unoptimize(struct spi_device *spi,
+				struct spi_message *m);
+
 /**
  * spi_message_init_with_transfers - Initialize spi_message and append transfers
  * @m: spi_message to be initialized

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-11-22  6:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-19 18:33 [PATCH] added API: spi_message_compile/optimize/hint/ Martin Sperl
     [not found] ` <0C7D5B1E-E561-4F52-BEA8-572EB0CA26A6-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
2013-11-19 19:25   ` Mark Brown
2013-11-20  9:11   ` martin sperl
2013-11-20 11:29   ` Mark Brown
     [not found]     ` <20131120112920.GX2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
2013-11-20 15:27       ` Martin Sperl
     [not found]         ` <05C64EF6-0F3B-4CBC-A462-45ED8F1564A7-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
2013-11-20 16:24           ` Mark Brown
     [not found]             ` <20131120162449.GJ2674-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
2013-11-20 18:42               ` Martin Sperl
     [not found]                 ` <9DE09AF6-AD13-45DE-A36F-BD65E619C342-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
2013-11-21 19:10                   ` Mark Brown
     [not found]                     ` <20131121191004.GH8120-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
2013-11-22  6:53                       ` Martin Sperl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).