linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] spi: add support for pre-cooking messages
@ 2024-02-12 23:26 David Lechner
  2024-02-12 23:26 ` [PATCH 1/5] spi: add spi_optimize_message() APIs David Lechner
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: David Lechner @ 2024-02-12 23:26 UTC (permalink / raw)
  To: Mark Brown
  Cc: David Lechner, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

This is a follow-up to [1] where it was suggested to break down the
proposed SPI offload support into smaller series.

This takes on the first suggested task of introducing an API to
"pre-cook" SPI messages. This idea was first discussed extensively in
2013 [2][3] and revisited more briefly 2022 [4].

The goal here is to be able to improve performance (higher throughput,
and reduced CPU usage) by allowing peripheral drivers that use the
same struct spi_message repeatedly to "pre-cook" the message once to
avoid repeating the same validation, and possibly other operations each
time the message is sent.

This series includes __spi_validate() and the automatic splitting of
xfers in the optimizations. Another frequently suggested optimization
is doing DMA mapping only once. This is not included in this series, but
can be added later (preferably by someone with a real use case for it).

To show how this all works and get some real-world measurements, this
series includes the core changes, optimization of a SPI controller
driver, and optimization of an ADC driver. This test case was only able
to take advantage of the single validation optimization, since it didn't
require splitting transfers. With these changes, CPU usage of the
threaded interrupt handler, which calls spi_sync(), was reduced from
83% to 73% while at the same time the sample rate (frequency of SPI
xfers) was increased from 20kHz to 25kHz.

Finally, there has been quite a bit of discussion on the naming of the
API already. The most natural suggestion of spi_message_[un]prepare()
conflicts with the existing prepare_message controller callback which
does something a bit different. I've so far stuck with [un]optimize()
from [3], but am not partial to it. Maybe [un]cook() would makes more
sense to people? Or maybe we could rename the existing prepare_message
callback to free up the name?

[1]: https://lore.kernel.org/linux-spi/20240109-axi-spi-engine-series-3-v1-1-e42c6a986580@baylibre.com/T/
[2]: https://lore.kernel.org/linux-spi/E81F4810-48DD-41EE-B110-D0D848B8A510@martin.sperl.org/T/
[3]: https://lore.kernel.org/linux-spi/39DEC004-10A1-47EF-9D77-276188D2580C@martin.sperl.org/T/
[4]: https://lore.kernel.org/linux-spi/20220525163946.48ea40c9@erd992/T/

---
David Lechner (5):
      spi: add spi_optimize_message() APIs
      spi: move splitting transfers to spi_optimize_message()
      spi: stm32: move splitting transfers to optimize_message
      spi: axi-spi-engine: move message compile to optimize_message
      iio: adc: ad7380: use spi_optimize_message()

 drivers/iio/adc/ad7380.c         |  52 ++++++--
 drivers/spi/spi-axi-spi-engine.c |  40 +++----
 drivers/spi/spi-stm32.c          |  28 +++--
 drivers/spi/spi.c                | 253 ++++++++++++++++++++++++++++++++-------
 include/linux/spi/spi.h          |  19 +++
 5 files changed, 305 insertions(+), 87 deletions(-)
---
base-commit: 5111fd347aee731964993fc021e428f8cf46a076
prerequisite-patch-id: 844c06b6caf25a2724e130dfa7999dc90dd26fde
change-id: 20240208-mainline-spi-precook-message-189b2f08ba7f

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-12 23:26 [PATCH 0/5] spi: add support for pre-cooking messages David Lechner
@ 2024-02-12 23:26 ` David Lechner
  2024-02-13  9:53   ` Nuno Sá
                     ` (2 more replies)
  2024-02-12 23:26 ` [PATCH 2/5] spi: move splitting transfers to spi_optimize_message() David Lechner
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 21+ messages in thread
From: David Lechner @ 2024-02-12 23:26 UTC (permalink / raw)
  To: Mark Brown
  Cc: David Lechner, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

This adds a new spi_optimize_message() function that can be used to
optimize SPI messages that are used more than once. Peripheral drivers
that use the same message multiple times can use this API to perform SPI
message validation and controller-specific optimizations once and then
reuse the message while avoiding the overhead of revalidating the
message on each spi_(a)sync() call.

Internally, the SPI core will also call this function for each message
if the peripheral driver did not explicitly call it. This is done to so
that controller drivers don't have to have multiple code paths for
optimized and non-optimized messages.

A hook is provided for controller drivers to perform controller-specific
optimizations.

Suggested-by: Martin Sperl <kernel@martin.sperl.org>
Link: https://lore.kernel.org/linux-spi/39DEC004-10A1-47EF-9D77-276188D2580C@martin.sperl.org/
Signed-off-by: David Lechner <dlechner@baylibre.com>
---
 drivers/spi/spi.c       | 145 ++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/spi/spi.h |  19 +++++++
 2 files changed, 160 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index c2b10e2c75f0..5bac215d7009 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2106,6 +2106,41 @@ struct spi_message *spi_get_next_queued_message(struct spi_controller *ctlr)
 }
 EXPORT_SYMBOL_GPL(spi_get_next_queued_message);
 
+/**
+ * __spi_unoptimize_message - shared implementation of spi_unoptimize_message()
+ *                            and spi_maybe_unoptimize_message()
+ * @msg: the message to unoptimize
+ *
+ * Periperhal drivers should use spi_unoptimize_message() and callers inside
+ * core should use spi_maybe_unoptimize_message() rather than calling this
+ * function directly.
+ *
+ * It is not valid to call this on a message that is not currently optimized.
+ */
+static void __spi_unoptimize_message(struct spi_message *msg)
+{
+	struct spi_controller *ctlr = msg->spi->controller;
+
+	if (ctlr->unoptimize_message)
+		ctlr->unoptimize_message(msg);
+
+	msg->optimized = false;
+	msg->opt_state = NULL;
+}
+
+/**
+ * spi_maybe_unoptimize_message - unoptimize msg not managed by a peripheral
+ * @msg: the message to unoptimize
+ *
+ * This function is used to unoptimize a message if and only if it was
+ * optimized by the core (via spi_maybe_optimize_message()).
+ */
+static void spi_maybe_unoptimize_message(struct spi_message *msg)
+{
+	if (!msg->pre_optimized && msg->optimized)
+		__spi_unoptimize_message(msg);
+}
+
 /**
  * spi_finalize_current_message() - the current message is complete
  * @ctlr: the controller to return the message to
@@ -2153,6 +2188,8 @@ void spi_finalize_current_message(struct spi_controller *ctlr)
 
 	mesg->prepared = false;
 
+	spi_maybe_unoptimize_message(mesg);
+
 	WRITE_ONCE(ctlr->cur_msg_incomplete, false);
 	smp_mb(); /* See __spi_pump_transfer_message()... */
 	if (READ_ONCE(ctlr->cur_msg_need_completion))
@@ -4194,6 +4231,99 @@ static int __spi_validate(struct spi_device *spi, struct spi_message *message)
 	return 0;
 }
 
+/**
+ * __spi_optimize_message - shared implementation for spi_optimize_message()
+ *                          and spi_maybe_optimize_message()
+ * @spi: the device that will be used for the message
+ * @msg: the message to optimize
+ * @pre_optimized: whether the message is considered pre-optimized or not
+ *
+ * Peripheral drivers will call spi_optimize_message() and the spi core will
+ * call spi_maybe_optimize_message() instead of calling this directly.
+ *
+ * It is not valid to call this on a message that has already been optimized.
+ *
+ * Return: zero on success, else a negative error code
+ */
+static int __spi_optimize_message(struct spi_device *spi,
+				  struct spi_message *msg,
+				  bool pre_optimized)
+{
+	struct spi_controller *ctlr = spi->controller;
+	int ret;
+
+	ret = __spi_validate(spi, msg);
+	if (ret)
+		return ret;
+
+	if (ctlr->optimize_message) {
+		ret = ctlr->optimize_message(msg);
+		if (ret)
+			return ret;
+	}
+
+	msg->pre_optimized = pre_optimized;
+	msg->optimized = true;
+
+	return 0;
+}
+
+/**
+ * spi_maybe_optimize_message - optimize message if it isn't already pre-optimized
+ * @spi: the device that will be used for the message
+ * @msg: the message to optimize
+ * Return: zero on success, else a negative error code
+ */
+static int spi_maybe_optimize_message(struct spi_device *spi,
+				      struct spi_message *msg)
+{
+	if (msg->pre_optimized)
+		return 0;
+
+	return __spi_optimize_message(spi, msg, false);
+}
+
+/**
+ * spi_optimize_message - do any one-time validation and setup for a SPI message
+ * @spi: the device that will be used for the message
+ * @msg: the message to optimize
+ *
+ * Peripheral drivers that reuse the same message repeatedly may call this to
+ * perform as much message prep as possible once, rather than repeating it each
+ * time a message transfer is performed to improve throughput and reduce CPU
+ * usage.
+ *
+ * Once a message has been optimized, it cannot be modified with the exception
+ * of updating the contents of any xfer->tx_buf (the pointer can't be changed,
+ * only the data in the memory it points to).
+ *
+ * Calls to this function must be balanced with calls to spi_unoptimize_message()
+ * to avoid leaking resources.
+ *
+ * Context: can sleep
+ * Return: zero on success, else a negative error code
+ */
+int spi_optimize_message(struct spi_device *spi, struct spi_message *msg)
+{
+	return __spi_optimize_message(spi, msg, true);
+}
+EXPORT_SYMBOL_GPL(spi_optimize_message);
+
+/**
+ * spi_unoptimize_message - releases any resources allocated by spi_optimize_message()
+ * @msg: the message to unoptimize
+ *
+ * Calls to this function must be balanced with calls to spi_optimize_message().
+ *
+ * Context: can sleep
+ */
+void spi_unoptimize_message(struct spi_message *msg)
+{
+	__spi_unoptimize_message(msg);
+	msg->pre_optimized = false;
+}
+EXPORT_SYMBOL_GPL(spi_unoptimize_message);
+
 static int __spi_async(struct spi_device *spi, struct spi_message *message)
 {
 	struct spi_controller *ctlr = spi->controller;
@@ -4258,8 +4388,8 @@ int spi_async(struct spi_device *spi, struct spi_message *message)
 	int ret;
 	unsigned long flags;
 
-	ret = __spi_validate(spi, message);
-	if (ret != 0)
+	ret = spi_maybe_optimize_message(spi, message);
+	if (ret)
 		return ret;
 
 	spin_lock_irqsave(&ctlr->bus_lock_spinlock, flags);
@@ -4271,6 +4401,8 @@ int spi_async(struct spi_device *spi, struct spi_message *message)
 
 	spin_unlock_irqrestore(&ctlr->bus_lock_spinlock, flags);
 
+	spi_maybe_unoptimize_message(message);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(spi_async);
@@ -4331,10 +4463,15 @@ static int __spi_sync(struct spi_device *spi, struct spi_message *message)
 		return -ESHUTDOWN;
 	}
 
-	status = __spi_validate(spi, message);
-	if (status != 0)
+	status = spi_maybe_optimize_message(spi, message);
+	if (status)
 		return status;
 
+	/*
+	 * NB: all return paths after this point must ensure that
+	 * spi_finalize_current_message() is called to avoid leaking resources.
+	 */
+
 	SPI_STATISTICS_INCREMENT_FIELD(ctlr->pcpu_statistics, spi_sync);
 	SPI_STATISTICS_INCREMENT_FIELD(spi->pcpu_statistics, spi_sync);
 
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index 2b8e2746769a..f7a269f4956b 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -475,6 +475,8 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch
  *
  * @set_cs: set the logic level of the chip select line.  May be called
  *          from interrupt context.
+ * @optimize_message: optimize the message for reuse
+ * @unoptimize_message: release resources allocated by optimize_message
  * @prepare_message: set up the controller to transfer a single message,
  *                   for example doing DMA mapping.  Called from threaded
  *                   context.
@@ -715,6 +717,8 @@ struct spi_controller {
 	struct completion               xfer_completion;
 	size_t				max_dma_len;
 
+	int (*optimize_message)(struct spi_message *msg);
+	int (*unoptimize_message)(struct spi_message *msg);
 	int (*prepare_transfer_hardware)(struct spi_controller *ctlr);
 	int (*transfer_one_message)(struct spi_controller *ctlr,
 				    struct spi_message *mesg);
@@ -1111,6 +1115,7 @@ struct spi_transfer {
  * @spi: SPI device to which the transaction is queued
  * @is_dma_mapped: if true, the caller provided both DMA and CPU virtual
  *	addresses for each transfer buffer
+ * @optimized: spi_optimize_message was called for the this message
  * @prepared: spi_prepare_message was called for the this message
  * @status: zero for success, else negative errno
  * @complete: called to report transaction completions
@@ -1120,6 +1125,7 @@ struct spi_transfer {
  *	successful segments
  * @queue: for use by whichever driver currently owns the message
  * @state: for use by whichever driver currently owns the message
+ * @opt_state: for use by whichever driver currently owns the message
  * @resources: for resource management when the SPI message is processed
  *
  * A @spi_message is used to execute an atomic sequence of data transfers,
@@ -1143,6 +1149,11 @@ struct spi_message {
 
 	unsigned		is_dma_mapped:1;
 
+	/* spi_optimize_message() was called for this message */
+	bool			pre_optimized;
+	/* __spi_optimize_message() was called for this message */
+	bool			optimized;
+
 	/* spi_prepare_message() was called for this message */
 	bool			prepared;
 
@@ -1172,6 +1183,11 @@ struct spi_message {
 	 */
 	struct list_head	queue;
 	void			*state;
+	/*
+	 * Optional state for use by controller driver between calls to
+	 * spi_optimize_message() and spi_unoptimize_message().
+	 */
+	void			*opt_state;
 
 	/* List of spi_res resources when the SPI message is processed */
 	struct list_head        resources;
@@ -1255,6 +1271,9 @@ static inline void spi_message_free(struct spi_message *m)
 	kfree(m);
 }
 
+extern int spi_optimize_message(struct spi_device *spi, struct spi_message *msg);
+extern void spi_unoptimize_message(struct spi_message *msg);
+
 extern int spi_setup(struct spi_device *spi);
 extern int spi_async(struct spi_device *spi, struct spi_message *message);
 extern int spi_slave_abort(struct spi_device *spi);

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/5] spi: move splitting transfers to spi_optimize_message()
  2024-02-12 23:26 [PATCH 0/5] spi: add support for pre-cooking messages David Lechner
  2024-02-12 23:26 ` [PATCH 1/5] spi: add spi_optimize_message() APIs David Lechner
@ 2024-02-12 23:26 ` David Lechner
  2024-02-13 17:35   ` Jonathan Cameron
  2024-02-12 23:26 ` [PATCH 3/5] spi: stm32: move splitting transfers to optimize_message David Lechner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: David Lechner @ 2024-02-12 23:26 UTC (permalink / raw)
  To: Mark Brown
  Cc: David Lechner, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

Splitting transfers is an expensive operation so we can potentially
optimize it by doing it only once per optimization of the message
instead of repeating each time the message is transferred.

The transfer splitting functions are currently the only user of
spi_res_alloc() so spi_res_release() can be safely moved at this time
from spi_finalize_current_message() to spi_unoptimize_message().

The doc comments of the public functions for splitting transfers are
also updated so that callers will know when it is safe to call them
to ensure proper resource management.

Signed-off-by: David Lechner <dlechner@baylibre.com>
---
 drivers/spi/spi.c | 110 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 68 insertions(+), 42 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 5bac215d7009..8a21fa5bd4b9 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1747,38 +1747,6 @@ static int __spi_pump_transfer_message(struct spi_controller *ctlr,
 
 	trace_spi_message_start(msg);
 
-	/*
-	 * If an SPI controller does not support toggling the CS line on each
-	 * transfer (indicated by the SPI_CS_WORD flag) or we are using a GPIO
-	 * for the CS line, we can emulate the CS-per-word hardware function by
-	 * splitting transfers into one-word transfers and ensuring that
-	 * cs_change is set for each transfer.
-	 */
-	if ((msg->spi->mode & SPI_CS_WORD) && (!(ctlr->mode_bits & SPI_CS_WORD) ||
-					       spi_is_csgpiod(msg->spi))) {
-		ret = spi_split_transfers_maxwords(ctlr, msg, 1);
-		if (ret) {
-			msg->status = ret;
-			spi_finalize_current_message(ctlr);
-			return ret;
-		}
-
-		list_for_each_entry(xfer, &msg->transfers, transfer_list) {
-			/* Don't change cs_change on the last entry in the list */
-			if (list_is_last(&xfer->transfer_list, &msg->transfers))
-				break;
-			xfer->cs_change = 1;
-		}
-	} else {
-		ret = spi_split_transfers_maxsize(ctlr, msg,
-						  spi_max_transfer_size(msg->spi));
-		if (ret) {
-			msg->status = ret;
-			spi_finalize_current_message(ctlr);
-			return ret;
-		}
-	}
-
 	if (ctlr->prepare_message) {
 		ret = ctlr->prepare_message(ctlr, msg);
 		if (ret) {
@@ -2124,6 +2092,8 @@ static void __spi_unoptimize_message(struct spi_message *msg)
 	if (ctlr->unoptimize_message)
 		ctlr->unoptimize_message(msg);
 
+	spi_res_release(ctlr, msg);
+
 	msg->optimized = false;
 	msg->opt_state = NULL;
 }
@@ -2169,15 +2139,6 @@ void spi_finalize_current_message(struct spi_controller *ctlr)
 
 	spi_unmap_msg(ctlr, mesg);
 
-	/*
-	 * In the prepare_messages callback the SPI bus has the opportunity
-	 * to split a transfer to smaller chunks.
-	 *
-	 * Release the split transfers here since spi_map_msg() is done on
-	 * the split transfers.
-	 */
-	spi_res_release(ctlr, mesg);
-
 	if (mesg->prepared && ctlr->unprepare_message) {
 		ret = ctlr->unprepare_message(ctlr, mesg);
 		if (ret) {
@@ -3819,6 +3780,10 @@ static int __spi_split_transfer_maxsize(struct spi_controller *ctlr,
  * @msg:   the @spi_message to transform
  * @maxsize:  the maximum when to apply this
  *
+ * This function allocates resources that are automatically freed during the
+ * spi message unoptimize phase so this function should only be called from
+ * optimize_message callbacks.
+ *
  * Return: status of transformation
  */
 int spi_split_transfers_maxsize(struct spi_controller *ctlr,
@@ -3857,6 +3822,10 @@ EXPORT_SYMBOL_GPL(spi_split_transfers_maxsize);
  * @msg:      the @spi_message to transform
  * @maxwords: the number of words to limit each transfer to
  *
+ * This function allocates resources that are automatically freed during the
+ * spi message unoptimize phase so this function should only be called from
+ * optimize_message callbacks.
+ *
  * Return: status of transformation
  */
 int spi_split_transfers_maxwords(struct spi_controller *ctlr,
@@ -4231,6 +4200,57 @@ static int __spi_validate(struct spi_device *spi, struct spi_message *message)
 	return 0;
 }
 
+/**
+ * spi_split_transfers - generic handling of transfer splitting
+ * @msg: the message to split
+ *
+ * Under certain conditions, a SPI controller may not support arbitrary
+ * transfer sizes or other features required by a peripheral. This function
+ * will split the transfers in the message into smaller transfers that are
+ * supported by the controller.
+ *
+ * Controllers with special requirements not covered here can also split
+ * transfers in the optimize_message() callback.
+ *
+ * Context: can sleep
+ * Return: zero on success, else a negative error code
+ */
+static int spi_split_transfers(struct spi_message *msg)
+{
+	struct spi_controller *ctlr = msg->spi->controller;
+	struct spi_transfer *xfer;
+	int ret;
+
+	/*
+	 * If an SPI controller does not support toggling the CS line on each
+	 * transfer (indicated by the SPI_CS_WORD flag) or we are using a GPIO
+	 * for the CS line, we can emulate the CS-per-word hardware function by
+	 * splitting transfers into one-word transfers and ensuring that
+	 * cs_change is set for each transfer.
+	 */
+	if ((msg->spi->mode & SPI_CS_WORD) && (!(ctlr->mode_bits & SPI_CS_WORD) ||
+					       spi_is_csgpiod(msg->spi))) {
+		ret = spi_split_transfers_maxwords(ctlr, msg, 1);
+		if (ret)
+			return ret;
+
+		list_for_each_entry(xfer, &msg->transfers, transfer_list) {
+			/* Don't change cs_change on the last entry in the list */
+			if (list_is_last(&xfer->transfer_list, &msg->transfers))
+				break;
+
+			xfer->cs_change = 1;
+		}
+	} else {
+		ret = spi_split_transfers_maxsize(ctlr, msg,
+						  spi_max_transfer_size(msg->spi));
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 /**
  * __spi_optimize_message - shared implementation for spi_optimize_message()
  *                          and spi_maybe_optimize_message()
@@ -4256,10 +4276,16 @@ static int __spi_optimize_message(struct spi_device *spi,
 	if (ret)
 		return ret;
 
+	ret = spi_split_transfers(msg);
+	if (ret)
+		return ret;
+
 	if (ctlr->optimize_message) {
 		ret = ctlr->optimize_message(msg);
-		if (ret)
+		if (ret) {
+			spi_res_release(ctlr, msg);
 			return ret;
+		}
 	}
 
 	msg->pre_optimized = pre_optimized;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/5] spi: stm32: move splitting transfers to optimize_message
  2024-02-12 23:26 [PATCH 0/5] spi: add support for pre-cooking messages David Lechner
  2024-02-12 23:26 ` [PATCH 1/5] spi: add spi_optimize_message() APIs David Lechner
  2024-02-12 23:26 ` [PATCH 2/5] spi: move splitting transfers to spi_optimize_message() David Lechner
@ 2024-02-12 23:26 ` David Lechner
  2024-02-12 23:26 ` [PATCH 4/5] spi: axi-spi-engine: move message compile " David Lechner
  2024-02-12 23:26 ` [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message() David Lechner
  4 siblings, 0 replies; 21+ messages in thread
From: David Lechner @ 2024-02-12 23:26 UTC (permalink / raw)
  To: Mark Brown
  Cc: David Lechner, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

Since splitting transfers was moved to spi_optimize_message() in the
core SPI code, we now need to use the optimize_message callback in the
STM32 SPI driver to ensure that the operation is only performed once
when spi_optimize_message() is used by peripheral drivers explicitly.

Signed-off-by: David Lechner <dlechner@baylibre.com>
---
 drivers/spi/spi-stm32.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c
index c32e57bb38bd..e4e7ddb7524a 100644
--- a/drivers/spi/spi-stm32.c
+++ b/drivers/spi/spi-stm32.c
@@ -1118,6 +1118,21 @@ static irqreturn_t stm32h7_spi_irq_thread(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static int stm32_spi_optimize_message(struct spi_message *msg)
+{
+	struct spi_controller *ctrl = msg->spi->controller;
+	struct stm32_spi *spi = spi_controller_get_devdata(ctrl);
+
+	/* On STM32H7, messages should not exceed a maximum size set
+	 * later via the set_number_of_data function. In order to
+	 * ensure that, split large messages into several messages
+	 */
+	if (spi->cfg->set_number_of_data)
+		return spi_split_transfers_maxwords(ctrl, msg, spi->t_size_max);
+
+	return 0;
+}
+
 /**
  * stm32_spi_prepare_msg - set up the controller to transfer a single message
  * @ctrl: controller interface
@@ -1163,18 +1178,6 @@ static int stm32_spi_prepare_msg(struct spi_controller *ctrl,
 		!!(spi_dev->mode & SPI_LSB_FIRST),
 		!!(spi_dev->mode & SPI_CS_HIGH));
 
-	/* On STM32H7, messages should not exceed a maximum size setted
-	 * afterward via the set_number_of_data function. In order to
-	 * ensure that, split large messages into several messages
-	 */
-	if (spi->cfg->set_number_of_data) {
-		int ret;
-
-		ret = spi_split_transfers_maxwords(ctrl, msg, spi->t_size_max);
-		if (ret)
-			return ret;
-	}
-
 	spin_lock_irqsave(&spi->lock, flags);
 
 	/* CPOL, CPHA and LSB FIRST bits have common register */
@@ -2180,6 +2183,7 @@ static int stm32_spi_probe(struct platform_device *pdev)
 	ctrl->max_speed_hz = spi->clk_rate / spi->cfg->baud_rate_div_min;
 	ctrl->min_speed_hz = spi->clk_rate / spi->cfg->baud_rate_div_max;
 	ctrl->use_gpio_descriptors = true;
+	ctrl->optimize_message = stm32_spi_optimize_message;
 	ctrl->prepare_message = stm32_spi_prepare_msg;
 	ctrl->transfer_one = stm32_spi_transfer_one;
 	ctrl->unprepare_message = stm32_spi_unprepare_msg;

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/5] spi: axi-spi-engine: move message compile to optimize_message
  2024-02-12 23:26 [PATCH 0/5] spi: add support for pre-cooking messages David Lechner
                   ` (2 preceding siblings ...)
  2024-02-12 23:26 ` [PATCH 3/5] spi: stm32: move splitting transfers to optimize_message David Lechner
@ 2024-02-12 23:26 ` David Lechner
  2024-02-12 23:26 ` [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message() David Lechner
  4 siblings, 0 replies; 21+ messages in thread
From: David Lechner @ 2024-02-12 23:26 UTC (permalink / raw)
  To: Mark Brown
  Cc: David Lechner, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

In the AXI SPI Engine driver, compiling the message is an expensive
operation. Previously, it was done per message transfer in the
prepare_message hook. This patch moves the message compile to the
optimize_message hook so that it is only done once per message in
cases where the peripheral driver calls spi_optimize_message().

This can be a significant performance improvement for some peripherals.
For example, the ad7380 driver saw a 13% improvement in throughput
when using the AXI SPI Engine driver with this patch.

Since we now need two message states, one for the optimization stage
that doesn't change for the lifetime of the message and one that is
reset on each transfer for managing the current transfer state, the old
msg->state is split into msg->opt_state and spi_engine->msg_state. The
latter is included in the driver struct now since there is only one
current message at a time that can ever use it and it is in a hot path
so avoiding allocating a new one on each message transfer saves a few
cpu cycles and lets us get rid of the prepare_message callback.

Signed-off-by: David Lechner <dlechner@baylibre.com>
---
 drivers/spi/spi-axi-spi-engine.c | 40 +++++++++++++++++-----------------------
 1 file changed, 17 insertions(+), 23 deletions(-)

diff --git a/drivers/spi/spi-axi-spi-engine.c b/drivers/spi/spi-axi-spi-engine.c
index ca66d202f0e2..6177c1a8d56e 100644
--- a/drivers/spi/spi-axi-spi-engine.c
+++ b/drivers/spi/spi-axi-spi-engine.c
@@ -109,6 +109,7 @@ struct spi_engine {
 	spinlock_t lock;
 
 	void __iomem *base;
+	struct spi_engine_message_state msg_state;
 	struct completion msg_complete;
 	unsigned int int_enable;
 };
@@ -499,17 +500,11 @@ static irqreturn_t spi_engine_irq(int irq, void *devid)
 	return IRQ_HANDLED;
 }
 
-static int spi_engine_prepare_message(struct spi_controller *host,
-				      struct spi_message *msg)
+static int spi_engine_optimize_message(struct spi_message *msg)
 {
 	struct spi_engine_program p_dry, *p;
-	struct spi_engine_message_state *st;
 	size_t size;
 
-	st = kzalloc(sizeof(*st), GFP_KERNEL);
-	if (!st)
-		return -ENOMEM;
-
 	spi_engine_precompile_message(msg);
 
 	p_dry.length = 0;
@@ -517,31 +512,22 @@ static int spi_engine_prepare_message(struct spi_controller *host,
 
 	size = sizeof(*p->instructions) * (p_dry.length + 1);
 	p = kzalloc(sizeof(*p) + size, GFP_KERNEL);
-	if (!p) {
-		kfree(st);
+	if (!p)
 		return -ENOMEM;
-	}
 
 	spi_engine_compile_message(msg, false, p);
 
 	spi_engine_program_add_cmd(p, false, SPI_ENGINE_CMD_SYNC(
 						AXI_SPI_ENGINE_CUR_MSG_SYNC_ID));
 
-	st->p = p;
-	st->cmd_buf = p->instructions;
-	st->cmd_length = p->length;
-	msg->state = st;
+	msg->opt_state = p;
 
 	return 0;
 }
 
-static int spi_engine_unprepare_message(struct spi_controller *host,
-					struct spi_message *msg)
+static int spi_engine_unoptimize_message(struct spi_message *msg)
 {
-	struct spi_engine_message_state *st = msg->state;
-
-	kfree(st->p);
-	kfree(st);
+	kfree(msg->opt_state);
 
 	return 0;
 }
@@ -550,10 +536,18 @@ static int spi_engine_transfer_one_message(struct spi_controller *host,
 	struct spi_message *msg)
 {
 	struct spi_engine *spi_engine = spi_controller_get_devdata(host);
-	struct spi_engine_message_state *st = msg->state;
+	struct spi_engine_message_state *st = &spi_engine->msg_state;
+	struct spi_engine_program *p = msg->opt_state;
 	unsigned int int_enable = 0;
 	unsigned long flags;
 
+	/* reinitialize message state for this transfer */
+	memset(st, 0, sizeof(*st));
+	st->p = p;
+	st->cmd_buf = p->instructions;
+	st->cmd_length = p->length;
+	msg->state = st;
+
 	reinit_completion(&spi_engine->msg_complete);
 
 	spin_lock_irqsave(&spi_engine->lock, flags);
@@ -658,8 +652,8 @@ static int spi_engine_probe(struct platform_device *pdev)
 	host->bits_per_word_mask = SPI_BPW_RANGE_MASK(1, 32);
 	host->max_speed_hz = clk_get_rate(spi_engine->ref_clk) / 2;
 	host->transfer_one_message = spi_engine_transfer_one_message;
-	host->prepare_message = spi_engine_prepare_message;
-	host->unprepare_message = spi_engine_unprepare_message;
+	host->optimize_message = spi_engine_optimize_message;
+	host->unoptimize_message = spi_engine_unoptimize_message;
 	host->num_chipselect = 8;
 
 	if (host->max_speed_hz == 0)

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message()
  2024-02-12 23:26 [PATCH 0/5] spi: add support for pre-cooking messages David Lechner
                   ` (3 preceding siblings ...)
  2024-02-12 23:26 ` [PATCH 4/5] spi: axi-spi-engine: move message compile " David Lechner
@ 2024-02-12 23:26 ` David Lechner
  2024-02-13  9:51   ` Nuno Sá
  2024-02-13 17:28   ` Jonathan Cameron
  4 siblings, 2 replies; 21+ messages in thread
From: David Lechner @ 2024-02-12 23:26 UTC (permalink / raw)
  To: Mark Brown
  Cc: David Lechner, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

This modifies the ad7380 ADC driver to use spi_optimize_message() to
optimize the SPI message for the buffered read operation. Since buffered
reads reuse the same SPI message for each read, this can improve
performance by reducing the overhead of setting up some parts the SPI
message in each spi_sync() call.

Signed-off-by: David Lechner <dlechner@baylibre.com>
---
 drivers/iio/adc/ad7380.c | 52 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
index abd746aef868..5c5d2642a474 100644
--- a/drivers/iio/adc/ad7380.c
+++ b/drivers/iio/adc/ad7380.c
@@ -133,6 +133,7 @@ struct ad7380_state {
 	struct spi_device *spi;
 	struct regulator *vref;
 	struct regmap *regmap;
+	struct spi_message *msg;
 	/*
 	 * DMA (thus cache coherency maintenance) requires the
 	 * transfer buffers to live in their own cache lines.
@@ -231,19 +232,55 @@ static int ad7380_debugfs_reg_access(struct iio_dev *indio_dev, u32 reg,
 	return ret;
 }
 
+static int ad7380_buffer_preenable(struct iio_dev *indio_dev)
+{
+	struct ad7380_state *st = iio_priv(indio_dev);
+	struct spi_transfer *xfer;
+	int ret;
+
+	st->msg = spi_message_alloc(1, GFP_KERNEL);
+	if (!st->msg)
+		return -ENOMEM;
+
+	xfer = list_first_entry(&st->msg->transfers, struct spi_transfer,
+				transfer_list);
+
+	xfer->bits_per_word = st->chip_info->channels[0].scan_type.realbits;
+	xfer->len = 4;
+	xfer->rx_buf = st->scan_data.raw;
+
+	ret = spi_optimize_message(st->spi, st->msg);
+	if (ret) {
+		spi_message_free(st->msg);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int ad7380_buffer_postdisable(struct iio_dev *indio_dev)
+{
+	struct ad7380_state *st = iio_priv(indio_dev);
+
+	spi_unoptimize_message(st->msg);
+	spi_message_free(st->msg);
+
+	return 0;
+}
+
+static const struct iio_buffer_setup_ops ad7380_buffer_setup_ops = {
+	.preenable = ad7380_buffer_preenable,
+	.postdisable = ad7380_buffer_postdisable,
+};
+
 static irqreturn_t ad7380_trigger_handler(int irq, void *p)
 {
 	struct iio_poll_func *pf = p;
 	struct iio_dev *indio_dev = pf->indio_dev;
 	struct ad7380_state *st = iio_priv(indio_dev);
-	struct spi_transfer xfer = {
-		.bits_per_word = st->chip_info->channels[0].scan_type.realbits,
-		.len = 4,
-		.rx_buf = st->scan_data.raw,
-	};
 	int ret;
 
-	ret = spi_sync_transfer(st->spi, &xfer, 1);
+	ret = spi_sync(st->spi, st->msg);
 	if (ret)
 		goto out;
 
@@ -420,7 +457,8 @@ static int ad7380_probe(struct spi_device *spi)
 
 	ret = devm_iio_triggered_buffer_setup(&spi->dev, indio_dev,
 					      iio_pollfunc_store_time,
-					      ad7380_trigger_handler, NULL);
+					      ad7380_trigger_handler,
+					      &ad7380_buffer_setup_ops);
 	if (ret)
 		return ret;
 

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message()
  2024-02-12 23:26 ` [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message() David Lechner
@ 2024-02-13  9:51   ` Nuno Sá
  2024-02-13 15:27     ` David Lechner
  2024-02-13 17:28   ` Jonathan Cameron
  1 sibling, 1 reply; 21+ messages in thread
From: Nuno Sá @ 2024-02-13  9:51 UTC (permalink / raw)
  To: David Lechner, Mark Brown
  Cc: Martin Sperl, David Jander, Jonathan Cameron, Michael Hennerich,
	Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Mon, 2024-02-12 at 17:26 -0600, David Lechner wrote:
> This modifies the ad7380 ADC driver to use spi_optimize_message() to
> optimize the SPI message for the buffered read operation. Since buffered
> reads reuse the same SPI message for each read, this can improve
> performance by reducing the overhead of setting up some parts the SPI
> message in each spi_sync() call.
> 
> Signed-off-by: David Lechner <dlechner@baylibre.com>
> ---
>  drivers/iio/adc/ad7380.c | 52 +++++++++++++++++++++++++++++++++++++++++------
> -
>  1 file changed, 45 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
> index abd746aef868..5c5d2642a474 100644
> --- a/drivers/iio/adc/ad7380.c
> +++ b/drivers/iio/adc/ad7380.c
> @@ -133,6 +133,7 @@ struct ad7380_state {
>  	struct spi_device *spi;
>  	struct regulator *vref;
>  	struct regmap *regmap;
> +	struct spi_message *msg;
>  	/*
>  	 * DMA (thus cache coherency maintenance) requires the
>  	 * transfer buffers to live in their own cache lines.
> @@ -231,19 +232,55 @@ static int ad7380_debugfs_reg_access(struct iio_dev
> *indio_dev, u32 reg,
>  	return ret;
>  }
>  
> +static int ad7380_buffer_preenable(struct iio_dev *indio_dev)
> +{
> +	struct ad7380_state *st = iio_priv(indio_dev);
> +	struct spi_transfer *xfer;
> +	int ret;
> +
> +	st->msg = spi_message_alloc(1, GFP_KERNEL);
> +	if (!st->msg)
> +		return -ENOMEM;
> +
> +	xfer = list_first_entry(&st->msg->transfers, struct spi_transfer,
> +				transfer_list);
> +
> +	xfer->bits_per_word = st->chip_info->channels[0].scan_type.realbits;
> +	xfer->len = 4;
> +	xfer->rx_buf = st->scan_data.raw;
> +
> +	ret = spi_optimize_message(st->spi, st->msg);
> +	if (ret) {
> +		spi_message_free(st->msg);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int ad7380_buffer_postdisable(struct iio_dev *indio_dev)
> +{
> +	struct ad7380_state *st = iio_priv(indio_dev);
> +
> +	spi_unoptimize_message(st->msg);
> +	spi_message_free(st->msg);
> +
> +	return 0;
> +}
> +

Not such a big deal but unless I'm missing something we could have the
spi_message (+ the transfer) statically allocated in struct ad7380_state and do
the optimize only once at probe (naturally with proper devm action for
unoptimize). Then we would not need to this for every buffer enable + disable. I
know in terms of performance it won't matter but it would be less code I guess.

Am I missing something?

- Nuno Sá


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-12 23:26 ` [PATCH 1/5] spi: add spi_optimize_message() APIs David Lechner
@ 2024-02-13  9:53   ` Nuno Sá
  2024-02-13 15:38     ` David Lechner
  2024-02-13 17:55     ` Mark Brown
  2024-02-13 17:25   ` Jonathan Cameron
  2024-02-13 18:55   ` Mark Brown
  2 siblings, 2 replies; 21+ messages in thread
From: Nuno Sá @ 2024-02-13  9:53 UTC (permalink / raw)
  To: David Lechner, Mark Brown
  Cc: Martin Sperl, David Jander, Jonathan Cameron, Michael Hennerich,
	Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Mon, 2024-02-12 at 17:26 -0600, David Lechner wrote:
> This adds a new spi_optimize_message() function that can be used to
> optimize SPI messages that are used more than once. Peripheral drivers
> that use the same message multiple times can use this API to perform SPI
> message validation and controller-specific optimizations once and then
> reuse the message while avoiding the overhead of revalidating the
> message on each spi_(a)sync() call.
> 
> Internally, the SPI core will also call this function for each message
> if the peripheral driver did not explicitly call it. This is done to so
> that controller drivers don't have to have multiple code paths for
> optimized and non-optimized messages.
> 
> A hook is provided for controller drivers to perform controller-specific
> optimizations.
> 
> Suggested-by: Martin Sperl <kernel@martin.sperl.org>
> Link:
> https://lore.kernel.org/linux-spi/39DEC004-10A1-47EF-9D77-276188D2580C@martin.sperl.org/
> Signed-off-by: David Lechner <dlechner@baylibre.com>
> ---
>  drivers/spi/spi.c       | 145 ++++++++++++++++++++++++++++++++++++++++++++++-
> -
>  include/linux/spi/spi.h |  19 +++++++
>  2 files changed, 160 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index c2b10e2c75f0..5bac215d7009 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -2106,6 +2106,41 @@ struct spi_message *spi_get_next_queued_message(struct
> spi_controller *ctlr)
>  }
>  EXPORT_SYMBOL_GPL(spi_get_next_queued_message);
>  
> +/**
> + * __spi_unoptimize_message - shared implementation of
> spi_unoptimize_message()
> + *                            and spi_maybe_unoptimize_message()
> + * @msg: the message to unoptimize
> + *
> + * Periperhal drivers should use spi_unoptimize_message() and callers inside
> + * core should use spi_maybe_unoptimize_message() rather than calling this
> + * function directly.
> + *
> + * It is not valid to call this on a message that is not currently optimized.
> + */
> +static void __spi_unoptimize_message(struct spi_message *msg)
> +{
> +	struct spi_controller *ctlr = msg->spi->controller;
> +
> +	if (ctlr->unoptimize_message)
> +		ctlr->unoptimize_message(msg);
> +
> +	msg->optimized = false;
> +	msg->opt_state = NULL;
> +}
> +
> +/**
> + * spi_maybe_unoptimize_message - unoptimize msg not managed by a peripheral
> + * @msg: the message to unoptimize
> + *
> + * This function is used to unoptimize a message if and only if it was
> + * optimized by the core (via spi_maybe_optimize_message()).
> + */
> +static void spi_maybe_unoptimize_message(struct spi_message *msg)
> +{
> +	if (!msg->pre_optimized && msg->optimized)
> +		__spi_unoptimize_message(msg);
> +}
> +
>  /**
>   * spi_finalize_current_message() - the current message is complete
>   * @ctlr: the controller to return the message to
> @@ -2153,6 +2188,8 @@ void spi_finalize_current_message(struct spi_controller
> *ctlr)
>  
>  	mesg->prepared = false;
>  
> +	spi_maybe_unoptimize_message(mesg);
> +
>  	WRITE_ONCE(ctlr->cur_msg_incomplete, false);
>  	smp_mb(); /* See __spi_pump_transfer_message()... */
>  	if (READ_ONCE(ctlr->cur_msg_need_completion))
> @@ -4194,6 +4231,99 @@ static int __spi_validate(struct spi_device *spi,
> struct spi_message *message)
>  	return 0;
>  }
>  
> +/**
> + * __spi_optimize_message - shared implementation for spi_optimize_message()
> + *                          and spi_maybe_optimize_message()
> + * @spi: the device that will be used for the message
> + * @msg: the message to optimize
> + * @pre_optimized: whether the message is considered pre-optimized or not
> + *
> + * Peripheral drivers will call spi_optimize_message() and the spi core will
> + * call spi_maybe_optimize_message() instead of calling this directly.
> + *
> + * It is not valid to call this on a message that has already been optimized.
> + *
> + * Return: zero on success, else a negative error code
> + */
> +static int __spi_optimize_message(struct spi_device *spi,
> +				  struct spi_message *msg,
> +				  bool pre_optimized)
> +{
> +	struct spi_controller *ctlr = spi->controller;
> +	int ret;
> +
> +	ret = __spi_validate(spi, msg);
> +	if (ret)
> +		return ret;
> +
> +	if (ctlr->optimize_message) {
> +		ret = ctlr->optimize_message(msg);
> +		if (ret)
> +			return ret;
> +	}

Not really sure what are the spi core guarantees or what controllers should be
expecting but I'll still ask :). Do we need to care about locking in here?
Mainly on the controller callback? For spi device related data I guess it's up
to the peripheral driver not to do anything weird or to properly protect the spi
message?

- Nuno Sá


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message()
  2024-02-13  9:51   ` Nuno Sá
@ 2024-02-13 15:27     ` David Lechner
  2024-02-13 16:08       ` Nuno Sá
  0 siblings, 1 reply; 21+ messages in thread
From: David Lechner @ 2024-02-13 15:27 UTC (permalink / raw)
  To: Nuno Sá
  Cc: Mark Brown, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Tue, Feb 13, 2024 at 3:47 AM Nuno Sá <noname.nuno@gmail.com> wrote:
>
> On Mon, 2024-02-12 at 17:26 -0600, David Lechner wrote:
> > This modifies the ad7380 ADC driver to use spi_optimize_message() to
> > optimize the SPI message for the buffered read operation. Since buffered
> > reads reuse the same SPI message for each read, this can improve
> > performance by reducing the overhead of setting up some parts the SPI
> > message in each spi_sync() call.
> >
> > Signed-off-by: David Lechner <dlechner@baylibre.com>
> > ---
> >  drivers/iio/adc/ad7380.c | 52 +++++++++++++++++++++++++++++++++++++++++------
> > -
> >  1 file changed, 45 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
> > index abd746aef868..5c5d2642a474 100644
> > --- a/drivers/iio/adc/ad7380.c
> > +++ b/drivers/iio/adc/ad7380.c
> > @@ -133,6 +133,7 @@ struct ad7380_state {
> >       struct spi_device *spi;
> >       struct regulator *vref;
> >       struct regmap *regmap;
> > +     struct spi_message *msg;
> >       /*
> >        * DMA (thus cache coherency maintenance) requires the
> >        * transfer buffers to live in their own cache lines.
> > @@ -231,19 +232,55 @@ static int ad7380_debugfs_reg_access(struct iio_dev
> > *indio_dev, u32 reg,
> >       return ret;
> >  }
> >
> > +static int ad7380_buffer_preenable(struct iio_dev *indio_dev)
> > +{
> > +     struct ad7380_state *st = iio_priv(indio_dev);
> > +     struct spi_transfer *xfer;
> > +     int ret;
> > +
> > +     st->msg = spi_message_alloc(1, GFP_KERNEL);
> > +     if (!st->msg)
> > +             return -ENOMEM;
> > +
> > +     xfer = list_first_entry(&st->msg->transfers, struct spi_transfer,
> > +                             transfer_list);
> > +
> > +     xfer->bits_per_word = st->chip_info->channels[0].scan_type.realbits;
> > +     xfer->len = 4;
> > +     xfer->rx_buf = st->scan_data.raw;
> > +
> > +     ret = spi_optimize_message(st->spi, st->msg);
> > +     if (ret) {
> > +             spi_message_free(st->msg);
> > +             return ret;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int ad7380_buffer_postdisable(struct iio_dev *indio_dev)
> > +{
> > +     struct ad7380_state *st = iio_priv(indio_dev);
> > +
> > +     spi_unoptimize_message(st->msg);
> > +     spi_message_free(st->msg);
> > +
> > +     return 0;
> > +}
> > +
>
> Not such a big deal but unless I'm missing something we could have the
> spi_message (+ the transfer) statically allocated in struct ad7380_state and do
> the optimize only once at probe (naturally with proper devm action for
> unoptimize). Then we would not need to this for every buffer enable + disable. I
> know in terms of performance it won't matter but it would be less code I guess.
>
> Am I missing something?

No, your understanding is correct for the current state of everything
in this series. So, we could do as you suggest, but I have a feeling
that future additions to this driver might require that it gets
changed back this way eventually.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-13  9:53   ` Nuno Sá
@ 2024-02-13 15:38     ` David Lechner
  2024-02-13 17:55     ` Mark Brown
  1 sibling, 0 replies; 21+ messages in thread
From: David Lechner @ 2024-02-13 15:38 UTC (permalink / raw)
  To: Nuno Sá
  Cc: Mark Brown, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Tue, Feb 13, 2024 at 3:50 AM Nuno Sá <noname.nuno@gmail.com> wrote:
>
> On Mon, 2024-02-12 at 17:26 -0600, David Lechner wrote:
> > This adds a new spi_optimize_message() function that can be used to
> > optimize SPI messages that are used more than once. Peripheral drivers
> > that use the same message multiple times can use this API to perform SPI
> > message validation and controller-specific optimizations once and then
> > reuse the message while avoiding the overhead of revalidating the
> > message on each spi_(a)sync() call.
> >
> > Internally, the SPI core will also call this function for each message
> > if the peripheral driver did not explicitly call it. This is done to so
> > that controller drivers don't have to have multiple code paths for
> > optimized and non-optimized messages.
> >
> > A hook is provided for controller drivers to perform controller-specific
> > optimizations.
> >
> > Suggested-by: Martin Sperl <kernel@martin.sperl.org>
> > Link:
> > https://lore.kernel.org/linux-spi/39DEC004-10A1-47EF-9D77-276188D2580C@martin.sperl.org/
> > Signed-off-by: David Lechner <dlechner@baylibre.com>
> > ---
> >  drivers/spi/spi.c       | 145 ++++++++++++++++++++++++++++++++++++++++++++++-
> > -
> >  include/linux/spi/spi.h |  19 +++++++
> >  2 files changed, 160 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> > index c2b10e2c75f0..5bac215d7009 100644
> > --- a/drivers/spi/spi.c
> > +++ b/drivers/spi/spi.c
> > @@ -2106,6 +2106,41 @@ struct spi_message *spi_get_next_queued_message(struct
> > spi_controller *ctlr)
> >  }
> >  EXPORT_SYMBOL_GPL(spi_get_next_queued_message);
> >
> > +/**
> > + * __spi_unoptimize_message - shared implementation of
> > spi_unoptimize_message()
> > + *                            and spi_maybe_unoptimize_message()
> > + * @msg: the message to unoptimize
> > + *
> > + * Periperhal drivers should use spi_unoptimize_message() and callers inside
> > + * core should use spi_maybe_unoptimize_message() rather than calling this
> > + * function directly.
> > + *
> > + * It is not valid to call this on a message that is not currently optimized.
> > + */
> > +static void __spi_unoptimize_message(struct spi_message *msg)
> > +{
> > +     struct spi_controller *ctlr = msg->spi->controller;
> > +
> > +     if (ctlr->unoptimize_message)
> > +             ctlr->unoptimize_message(msg);
> > +
> > +     msg->optimized = false;
> > +     msg->opt_state = NULL;
> > +}
> > +
> > +/**
> > + * spi_maybe_unoptimize_message - unoptimize msg not managed by a peripheral
> > + * @msg: the message to unoptimize
> > + *
> > + * This function is used to unoptimize a message if and only if it was
> > + * optimized by the core (via spi_maybe_optimize_message()).
> > + */
> > +static void spi_maybe_unoptimize_message(struct spi_message *msg)
> > +{
> > +     if (!msg->pre_optimized && msg->optimized)
> > +             __spi_unoptimize_message(msg);
> > +}
> > +
> >  /**
> >   * spi_finalize_current_message() - the current message is complete
> >   * @ctlr: the controller to return the message to
> > @@ -2153,6 +2188,8 @@ void spi_finalize_current_message(struct spi_controller
> > *ctlr)
> >
> >       mesg->prepared = false;
> >
> > +     spi_maybe_unoptimize_message(mesg);
> > +
> >       WRITE_ONCE(ctlr->cur_msg_incomplete, false);
> >       smp_mb(); /* See __spi_pump_transfer_message()... */
> >       if (READ_ONCE(ctlr->cur_msg_need_completion))
> > @@ -4194,6 +4231,99 @@ static int __spi_validate(struct spi_device *spi,
> > struct spi_message *message)
> >       return 0;
> >  }
> >
> > +/**
> > + * __spi_optimize_message - shared implementation for spi_optimize_message()
> > + *                          and spi_maybe_optimize_message()
> > + * @spi: the device that will be used for the message
> > + * @msg: the message to optimize
> > + * @pre_optimized: whether the message is considered pre-optimized or not
> > + *
> > + * Peripheral drivers will call spi_optimize_message() and the spi core will
> > + * call spi_maybe_optimize_message() instead of calling this directly.
> > + *
> > + * It is not valid to call this on a message that has already been optimized.
> > + *
> > + * Return: zero on success, else a negative error code
> > + */
> > +static int __spi_optimize_message(struct spi_device *spi,
> > +                               struct spi_message *msg,
> > +                               bool pre_optimized)
> > +{
> > +     struct spi_controller *ctlr = spi->controller;
> > +     int ret;
> > +
> > +     ret = __spi_validate(spi, msg);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (ctlr->optimize_message) {
> > +             ret = ctlr->optimize_message(msg);
> > +             if (ret)
> > +                     return ret;
> > +     }
>
> Not really sure what are the spi core guarantees or what controllers should be
> expecting but I'll still ask :). Do we need to care about locking in here?
> Mainly on the controller callback? For spi device related data I guess it's up
> to the peripheral driver not to do anything weird or to properly protect the spi
> message?
>

Currently, it is expected that this operates only on the message
struct and doesn't poke any hardware so no locking is currently
required. And, yes, it is up to peripheral drivers that opt in to
pre-optimization to follow the rules of not touching the message while
it is in the optimized state. For peripheral drivers that don't call
spi_optimized_message(), nothing has really changed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message()
  2024-02-13 15:27     ` David Lechner
@ 2024-02-13 16:08       ` Nuno Sá
  2024-02-13 17:31         ` Jonathan Cameron
  0 siblings, 1 reply; 21+ messages in thread
From: Nuno Sá @ 2024-02-13 16:08 UTC (permalink / raw)
  To: David Lechner
  Cc: Mark Brown, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Tue, 2024-02-13 at 09:27 -0600, David Lechner wrote:
> On Tue, Feb 13, 2024 at 3:47 AM Nuno Sá <noname.nuno@gmail.com> wrote:
> > 
> > On Mon, 2024-02-12 at 17:26 -0600, David Lechner wrote:
> > > This modifies the ad7380 ADC driver to use spi_optimize_message() to
> > > optimize the SPI message for the buffered read operation. Since buffered
> > > reads reuse the same SPI message for each read, this can improve
> > > performance by reducing the overhead of setting up some parts the SPI
> > > message in each spi_sync() call.
> > > 
> > > Signed-off-by: David Lechner <dlechner@baylibre.com>
> > > ---
> > >  drivers/iio/adc/ad7380.c | 52 +++++++++++++++++++++++++++++++++++++++++--
> > > ----
> > > -
> > >  1 file changed, 45 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
> > > index abd746aef868..5c5d2642a474 100644
> > > --- a/drivers/iio/adc/ad7380.c
> > > +++ b/drivers/iio/adc/ad7380.c
> > > @@ -133,6 +133,7 @@ struct ad7380_state {
> > >       struct spi_device *spi;
> > >       struct regulator *vref;
> > >       struct regmap *regmap;
> > > +     struct spi_message *msg;
> > >       /*
> > >        * DMA (thus cache coherency maintenance) requires the
> > >        * transfer buffers to live in their own cache lines.
> > > @@ -231,19 +232,55 @@ static int ad7380_debugfs_reg_access(struct iio_dev
> > > *indio_dev, u32 reg,
> > >       return ret;
> > >  }
> > > 
> > > +static int ad7380_buffer_preenable(struct iio_dev *indio_dev)
> > > +{
> > > +     struct ad7380_state *st = iio_priv(indio_dev);
> > > +     struct spi_transfer *xfer;
> > > +     int ret;
> > > +
> > > +     st->msg = spi_message_alloc(1, GFP_KERNEL);
> > > +     if (!st->msg)
> > > +             return -ENOMEM;
> > > +
> > > +     xfer = list_first_entry(&st->msg->transfers, struct spi_transfer,
> > > +                             transfer_list);
> > > +
> > > +     xfer->bits_per_word = st->chip_info->channels[0].scan_type.realbits;
> > > +     xfer->len = 4;
> > > +     xfer->rx_buf = st->scan_data.raw;
> > > +
> > > +     ret = spi_optimize_message(st->spi, st->msg);
> > > +     if (ret) {
> > > +             spi_message_free(st->msg);
> > > +             return ret;
> > > +     }
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static int ad7380_buffer_postdisable(struct iio_dev *indio_dev)
> > > +{
> > > +     struct ad7380_state *st = iio_priv(indio_dev);
> > > +
> > > +     spi_unoptimize_message(st->msg);
> > > +     spi_message_free(st->msg);
> > > +
> > > +     return 0;
> > > +}
> > > +
> > 
> > Not such a big deal but unless I'm missing something we could have the
> > spi_message (+ the transfer) statically allocated in struct ad7380_state and
> > do
> > the optimize only once at probe (naturally with proper devm action for
> > unoptimize). Then we would not need to this for every buffer enable +
> > disable. I
> > know in terms of performance it won't matter but it would be less code I
> > guess.
> > 
> > Am I missing something?
> 
> No, your understanding is correct for the current state of everything
> in this series. So, we could do as you suggest, but I have a feeling
> that future additions to this driver might require that it gets
> changed back this way eventually.

Hmm, not really sure about that as chip_info stuff is always our friend :). And
I'm anyways of the opinion of keeping things simpler and start to evolve when
really needed (because often we never really need to evolve). But bah, as I
said... this is really not a big deal.

- Nuno Sá

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-12 23:26 ` [PATCH 1/5] spi: add spi_optimize_message() APIs David Lechner
  2024-02-13  9:53   ` Nuno Sá
@ 2024-02-13 17:25   ` Jonathan Cameron
  2024-02-13 19:20     ` David Lechner
  2024-02-13 18:55   ` Mark Brown
  2 siblings, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-13 17:25 UTC (permalink / raw)
  To: David Lechner
  Cc: Mark Brown, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Mon, 12 Feb 2024 17:26:41 -0600
David Lechner <dlechner@baylibre.com> wrote:

> This adds a new spi_optimize_message() function that can be used to
> optimize SPI messages that are used more than once. Peripheral drivers
> that use the same message multiple times can use this API to perform SPI
> message validation and controller-specific optimizations once and then
> reuse the message while avoiding the overhead of revalidating the
> message on each spi_(a)sync() call.
> 
> Internally, the SPI core will also call this function for each message
> if the peripheral driver did not explicitly call it. This is done to so
> that controller drivers don't have to have multiple code paths for
> optimized and non-optimized messages.
> 
> A hook is provided for controller drivers to perform controller-specific
> optimizations.
> 
> Suggested-by: Martin Sperl <kernel@martin.sperl.org>
> Link: https://lore.kernel.org/linux-spi/39DEC004-10A1-47EF-9D77-276188D2580C@martin.sperl.org/
> Signed-off-by: David Lechner <dlechner@baylibre.com>

A few trivial things inline but looks good to me in general.

I thought about suggesting splitting this into an initial patch that just does
the bits without the controller callbacks. Maybe it would work better that way
with that introduced after the validate and splitting of transfers (so most
of patches 1 and 2) as a patch 3 prior to the stm32 additions?

> ---
>  drivers/spi/spi.c       | 145 ++++++++++++++++++++++++++++++++++++++++++++++--
>  include/linux/spi/spi.h |  19 +++++++
>  2 files changed, 160 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index c2b10e2c75f0..5bac215d7009 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -2106,6 +2106,41 @@ struct spi_message *spi_get_next_queued_message(struct spi_controller *ctlr)
>  }
>  EXPORT_SYMBOL_GPL(spi_get_next_queued_message);
>  
> +/**
> + * __spi_unoptimize_message - shared implementation of spi_unoptimize_message()
> + *                            and spi_maybe_unoptimize_message()
> + * @msg: the message to unoptimize
> + *
> + * Periperhal drivers should use spi_unoptimize_message() and callers inside
> + * core should use spi_maybe_unoptimize_message() rather than calling this
> + * function directly.
> + *
> + * It is not valid to call this on a message that is not currently optimized.
> + */
> +static void __spi_unoptimize_message(struct spi_message *msg)
> +{
> +	struct spi_controller *ctlr = msg->spi->controller;
> +
> +	if (ctlr->unoptimize_message)
> +		ctlr->unoptimize_message(msg);
> +
> +	msg->optimized = false;
> +	msg->opt_state = NULL;
> +}

Seems misbalanced that this doesn't take a pre_optimized flag in but
__spi_optimize does. I'd move handling that to outside the call in both cases.


>  	spin_lock_irqsave(&ctlr->bus_lock_spinlock, flags);
> @@ -4271,6 +4401,8 @@ int spi_async(struct spi_device *spi, struct spi_message *message)
>  
>  	spin_unlock_irqrestore(&ctlr->bus_lock_spinlock, flags);
>  
> +	spi_maybe_unoptimize_message(message);
> +
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(spi_async);
> @@ -4331,10 +4463,15 @@ static int __spi_sync(struct spi_device *spi, struct spi_message *message)
>  		return -ESHUTDOWN;
>  	}
>  
> -	status = __spi_validate(spi, message);
> -	if (status != 0)
> +	status = spi_maybe_optimize_message(spi, message);
> +	if (status)
>  		return status;
>  
> +	/*
> +	 * NB: all return paths after this point must ensure that
> +	 * spi_finalize_current_message() is called to avoid leaking resources.

I'm not sure a catch all like that makes sense. Not sufficient to call
the finer grained spi_maybe_unoptimize_message()  ?
> +	 */
> +
>  	SPI_STATISTICS_INCREMENT_FIELD(ctlr->pcpu_statistics, spi_sync);
>  	SPI_STATISTICS_INCREMENT_FIELD(spi->pcpu_statistics, spi_sync);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message()
  2024-02-12 23:26 ` [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message() David Lechner
  2024-02-13  9:51   ` Nuno Sá
@ 2024-02-13 17:28   ` Jonathan Cameron
  1 sibling, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-13 17:28 UTC (permalink / raw)
  To: David Lechner
  Cc: Mark Brown, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Mon, 12 Feb 2024 17:26:45 -0600
David Lechner <dlechner@baylibre.com> wrote:

> This modifies the ad7380 ADC driver to use spi_optimize_message() to
> optimize the SPI message for the buffered read operation. Since buffered
> reads reuse the same SPI message for each read, this can improve
> performance by reducing the overhead of setting up some parts the SPI
> message in each spi_sync() call.
> 
> Signed-off-by: David Lechner <dlechner@baylibre.com>
> ---
>  drivers/iio/adc/ad7380.c | 52 +++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 45 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
> index abd746aef868..5c5d2642a474 100644
> --- a/drivers/iio/adc/ad7380.c
> +++ b/drivers/iio/adc/ad7380.c
> @@ -133,6 +133,7 @@ struct ad7380_state {
>  	struct spi_device *spi;
>  	struct regulator *vref;
>  	struct regmap *regmap;
> +	struct spi_message *msg;
>  	/*
>  	 * DMA (thus cache coherency maintenance) requires the
>  	 * transfer buffers to live in their own cache lines.
> @@ -231,19 +232,55 @@ static int ad7380_debugfs_reg_access(struct iio_dev *indio_dev, u32 reg,
>  	return ret;
>  }
>  
> +static int ad7380_buffer_preenable(struct iio_dev *indio_dev)
> +{
> +	struct ad7380_state *st = iio_priv(indio_dev);
> +	struct spi_transfer *xfer;
> +	int ret;
> +
> +	st->msg = spi_message_alloc(1, GFP_KERNEL);

As it only ever has one element, is there a clear advantage over
just embedding the spi_message in the structure rather than
as a separate allocation? You'd need the transfer as well.

	spi_message_init_with_transfers(st->msg, &st->trans, 1);

The transfer is then also available without walking the list (though
obviously you don't walk very far ;).

> +	if (!st->msg)
> +		return -ENOMEM;
> +
> +	xfer = list_first_entry(&st->msg->transfers, struct spi_transfer,
> +				transfer_list);
> +
> +	xfer->bits_per_word = st->chip_info->channels[0].scan_type.realbits;
> +	xfer->len = 4;
> +	xfer->rx_buf = st->scan_data.raw;
> +
> +	ret = spi_optimize_message(st->spi, st->msg);
> +	if (ret) {
> +		spi_message_free(st->msg);
Would avoid freeing explicitly here or later if it was embedded in
struct ad7380_state

Also, this doesn't seem very dynamic in general. Anything stopping this
being done at probe() as a one time thing?

> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static int ad7380_buffer_postdisable(struct iio_dev *indio_dev)
> +{
> +	struct ad7380_state *st = iio_priv(indio_dev);
> +
> +	spi_unoptimize_message(st->msg);
> +	spi_message_free(st->msg);
> +
> +	return 0;
> +}
> +
> +static const struct iio_buffer_setup_ops ad7380_buffer_setup_ops = {
> +	.preenable = ad7380_buffer_preenable,
> +	.postdisable = ad7380_buffer_postdisable,
> +};
> +
>  static irqreturn_t ad7380_trigger_handler(int irq, void *p)
>  {
>  	struct iio_poll_func *pf = p;
>  	struct iio_dev *indio_dev = pf->indio_dev;
>  	struct ad7380_state *st = iio_priv(indio_dev);
> -	struct spi_transfer xfer = {
> -		.bits_per_word = st->chip_info->channels[0].scan_type.realbits,
> -		.len = 4,
> -		.rx_buf = st->scan_data.raw,
> -	};
>  	int ret;
>  
> -	ret = spi_sync_transfer(st->spi, &xfer, 1);
> +	ret = spi_sync(st->spi, st->msg);
>  	if (ret)
>  		goto out;
>  
> @@ -420,7 +457,8 @@ static int ad7380_probe(struct spi_device *spi)
>  
>  	ret = devm_iio_triggered_buffer_setup(&spi->dev, indio_dev,
>  					      iio_pollfunc_store_time,
> -					      ad7380_trigger_handler, NULL);
> +					      ad7380_trigger_handler,
> +					      &ad7380_buffer_setup_ops);
>  	if (ret)
>  		return ret;
>  
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message()
  2024-02-13 16:08       ` Nuno Sá
@ 2024-02-13 17:31         ` Jonathan Cameron
  2024-02-13 18:59           ` David Lechner
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-13 17:31 UTC (permalink / raw)
  To: Nuno Sá
  Cc: David Lechner, Mark Brown, Martin Sperl, David Jander,
	Jonathan Cameron, Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Tue, 13 Feb 2024 17:08:19 +0100
Nuno Sá <noname.nuno@gmail.com> wrote:

> On Tue, 2024-02-13 at 09:27 -0600, David Lechner wrote:
> > On Tue, Feb 13, 2024 at 3:47 AM Nuno Sá <noname.nuno@gmail.com> wrote:  
> > > 
> > > On Mon, 2024-02-12 at 17:26 -0600, David Lechner wrote:  
> > > > This modifies the ad7380 ADC driver to use spi_optimize_message() to
> > > > optimize the SPI message for the buffered read operation. Since buffered
> > > > reads reuse the same SPI message for each read, this can improve
> > > > performance by reducing the overhead of setting up some parts the SPI
> > > > message in each spi_sync() call.
> > > > 
> > > > Signed-off-by: David Lechner <dlechner@baylibre.com>
> > > > ---
> > > >  drivers/iio/adc/ad7380.c | 52 +++++++++++++++++++++++++++++++++++++++++--
> > > > ----
> > > > -
> > > >  1 file changed, 45 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c
> > > > index abd746aef868..5c5d2642a474 100644
> > > > --- a/drivers/iio/adc/ad7380.c
> > > > +++ b/drivers/iio/adc/ad7380.c
> > > > @@ -133,6 +133,7 @@ struct ad7380_state {
> > > >       struct spi_device *spi;
> > > >       struct regulator *vref;
> > > >       struct regmap *regmap;
> > > > +     struct spi_message *msg;
> > > >       /*
> > > >        * DMA (thus cache coherency maintenance) requires the
> > > >        * transfer buffers to live in their own cache lines.
> > > > @@ -231,19 +232,55 @@ static int ad7380_debugfs_reg_access(struct iio_dev
> > > > *indio_dev, u32 reg,
> > > >       return ret;
> > > >  }
> > > > 
> > > > +static int ad7380_buffer_preenable(struct iio_dev *indio_dev)
> > > > +{
> > > > +     struct ad7380_state *st = iio_priv(indio_dev);
> > > > +     struct spi_transfer *xfer;
> > > > +     int ret;
> > > > +
> > > > +     st->msg = spi_message_alloc(1, GFP_KERNEL);
> > > > +     if (!st->msg)
> > > > +             return -ENOMEM;
> > > > +
> > > > +     xfer = list_first_entry(&st->msg->transfers, struct spi_transfer,
> > > > +                             transfer_list);
> > > > +
> > > > +     xfer->bits_per_word = st->chip_info->channels[0].scan_type.realbits;
> > > > +     xfer->len = 4;
> > > > +     xfer->rx_buf = st->scan_data.raw;
> > > > +
> > > > +     ret = spi_optimize_message(st->spi, st->msg);
> > > > +     if (ret) {
> > > > +             spi_message_free(st->msg);
> > > > +             return ret;
> > > > +     }
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static int ad7380_buffer_postdisable(struct iio_dev *indio_dev)
> > > > +{
> > > > +     struct ad7380_state *st = iio_priv(indio_dev);
> > > > +
> > > > +     spi_unoptimize_message(st->msg);
> > > > +     spi_message_free(st->msg);
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +  
> > > 
> > > Not such a big deal but unless I'm missing something we could have the
> > > spi_message (+ the transfer) statically allocated in struct ad7380_state and
> > > do
> > > the optimize only once at probe (naturally with proper devm action for
> > > unoptimize). Then we would not need to this for every buffer enable +
> > > disable. I
> > > know in terms of performance it won't matter but it would be less code I
> > > guess.
> > > 
> > > Am I missing something?  
> > 
> > No, your understanding is correct for the current state of everything
> > in this series. So, we could do as you suggest, but I have a feeling
> > that future additions to this driver might require that it gets
> > changed back this way eventually.  
> 
> Hmm, not really sure about that as chip_info stuff is always our friend :). And
> I'm anyways of the opinion of keeping things simpler and start to evolve when
> really needed (because often we never really need to evolve). But bah, as I
> said... this is really not a big deal.
> 
Oops should have read Nuno's review before replying!

I'd rather we embedded it for now and did the optimization at probe.
Whilst it's a lot of work per transfer it's not enough to worry about delaying
it until preenable().  Easy to make that move and take it dynamic when
driver changes need it.  In meantime, I don't want lots of other drivers
picking up this pattern when they may never need the complexity of
making things more dynamic.

Jonathan

> - Nuno Sá
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/5] spi: move splitting transfers to spi_optimize_message()
  2024-02-12 23:26 ` [PATCH 2/5] spi: move splitting transfers to spi_optimize_message() David Lechner
@ 2024-02-13 17:35   ` Jonathan Cameron
  0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2024-02-13 17:35 UTC (permalink / raw)
  To: David Lechner
  Cc: Mark Brown, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Mon, 12 Feb 2024 17:26:42 -0600
David Lechner <dlechner@baylibre.com> wrote:

> Splitting transfers is an expensive operation so we can potentially
> optimize it by doing it only once per optimization of the message
> instead of repeating each time the message is transferred.
> 
> The transfer splitting functions are currently the only user of
> spi_res_alloc() so spi_res_release() can be safely moved at this time
> from spi_finalize_current_message() to spi_unoptimize_message().
> 
> The doc comments of the public functions for splitting transfers are
> also updated so that callers will know when it is safe to call them
> to ensure proper resource management.
> 
> Signed-off-by: David Lechner <dlechner@baylibre.com>
> ---
Trivial thing (which applies equally to the original code).
Otherwise LGTM.
FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> +/**
> + * spi_split_transfers - generic handling of transfer splitting
> + * @msg: the message to split
> + *
> + * Under certain conditions, a SPI controller may not support arbitrary
> + * transfer sizes or other features required by a peripheral. This function
> + * will split the transfers in the message into smaller transfers that are
> + * supported by the controller.
> + *
> + * Controllers with special requirements not covered here can also split
> + * transfers in the optimize_message() callback.
> + *
> + * Context: can sleep
> + * Return: zero on success, else a negative error code
> + */
> +static int spi_split_transfers(struct spi_message *msg)
> +{
> +	struct spi_controller *ctlr = msg->spi->controller;
> +	struct spi_transfer *xfer;
> +	int ret;
> +
> +	/*
> +	 * If an SPI controller does not support toggling the CS line on each
> +	 * transfer (indicated by the SPI_CS_WORD flag) or we are using a GPIO
> +	 * for the CS line, we can emulate the CS-per-word hardware function by
> +	 * splitting transfers into one-word transfers and ensuring that
> +	 * cs_change is set for each transfer.
> +	 */
> +	if ((msg->spi->mode & SPI_CS_WORD) && (!(ctlr->mode_bits & SPI_CS_WORD) ||
> +					       spi_is_csgpiod(msg->spi))) {
	if ((msg->spi->mode & SPI_CS_WORD) &&
	    (!(ctlr->mode_bits & SPI_CS_WORD) || spi_is_csgpiod(msg->spi))) {

Seems easier to read to me. I appreciate you are just moving it though so
don't mind that much if you leave it in the original form.



> +		ret = spi_split_transfers_maxwords(ctlr, msg, 1);
> +		if (ret)
> +			return ret;
> +
> +		list_for_each_entry(xfer, &msg->transfers, transfer_list) {
> +			/* Don't change cs_change on the last entry in the list */
> +			if (list_is_last(&xfer->transfer_list, &msg->transfers))
> +				break;
> +
> +			xfer->cs_change = 1;
> +		}
> +	} else {
> +		ret = spi_split_transfers_maxsize(ctlr, msg,
> +						  spi_max_transfer_size(msg->spi));
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-13  9:53   ` Nuno Sá
  2024-02-13 15:38     ` David Lechner
@ 2024-02-13 17:55     ` Mark Brown
  1 sibling, 0 replies; 21+ messages in thread
From: Mark Brown @ 2024-02-13 17:55 UTC (permalink / raw)
  To: Nuno Sá
  Cc: David Lechner, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

On Tue, Feb 13, 2024 at 10:53:56AM +0100, Nuno Sá wrote:
> On Mon, 2024-02-12 at 17:26 -0600, David Lechner wrote:
> > This adds a new spi_optimize_message() function that can be used to
> > optimize SPI messages that are used more than once. Peripheral drivers
> > that use the same message multiple times can use this API to perform SPI
> > message validation and controller-specific optimizations once and then
> > reuse the message while avoiding the overhead of revalidating the

Please delete unneeded context from mails when replying.  Doing this
makes it much easier to find your reply in the message, helping ensure
it won't be missed by people scrolling through the irrelevant quoted
material.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-12 23:26 ` [PATCH 1/5] spi: add spi_optimize_message() APIs David Lechner
  2024-02-13  9:53   ` Nuno Sá
  2024-02-13 17:25   ` Jonathan Cameron
@ 2024-02-13 18:55   ` Mark Brown
  2024-02-13 19:26     ` David Lechner
  2 siblings, 1 reply; 21+ messages in thread
From: Mark Brown @ 2024-02-13 18:55 UTC (permalink / raw)
  To: David Lechner
  Cc: Martin Sperl, David Jander, Jonathan Cameron, Michael Hennerich,
	Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On Mon, Feb 12, 2024 at 05:26:41PM -0600, David Lechner wrote:

> This adds a new spi_optimize_message() function that can be used to
> optimize SPI messages that are used more than once. Peripheral drivers
> that use the same message multiple times can use this API to perform SPI
> message validation and controller-specific optimizations once and then
> reuse the message while avoiding the overhead of revalidating the
> message on each spi_(a)sync() call.

This looks basically fine.  Some small comments:

> +/**
> + * __spi_unoptimize_message - shared implementation of spi_unoptimize_message()
> + *                            and spi_maybe_unoptimize_message()
> + * @msg: the message to unoptimize

There's no need for kerneldoc for internal only functions and it can
make the generated documentation a bit confusing for users.  Just skip
the /** for /*.

> +static int __spi_optimize_message(struct spi_device *spi,
> +				  struct spi_message *msg,
> +				  bool pre_optimized)
> +{
> +	struct spi_controller *ctlr = spi->controller;
> +	int ret;
> +
> +	ret = __spi_validate(spi, msg);
> +	if (ret)
> +		return ret;
> +
> +	if (ctlr->optimize_message) {
> +		ret = ctlr->optimize_message(msg);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	msg->pre_optimized = pre_optimized;

It would probably be clearer to name the parameter pre_optimising rather
than pre_optimized, as it is the logic is a bit confusing.  Either that
or some comments.  A similar issue applies on the cleanup path.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message()
  2024-02-13 17:31         ` Jonathan Cameron
@ 2024-02-13 18:59           ` David Lechner
  0 siblings, 0 replies; 21+ messages in thread
From: David Lechner @ 2024-02-13 18:59 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Nuno Sá,
	Mark Brown, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Tue, Feb 13, 2024 at 11:31 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
> On Tue, 13 Feb 2024 17:08:19 +0100
> Nuno Sá <noname.nuno@gmail.com> wrote:
>
> > On Tue, 2024-02-13 at 09:27 -0600, David Lechner wrote:
> > > On Tue, Feb 13, 2024 at 3:47 AM Nuno Sá <noname.nuno@gmail.com> wrote:
> > > >

...

> > > > Am I missing something?
> > >
> > > No, your understanding is correct for the current state of everything
> > > in this series. So, we could do as you suggest, but I have a feeling
> > > that future additions to this driver might require that it gets
> > > changed back this way eventually.
> >
> > Hmm, not really sure about that as chip_info stuff is always our friend :). And
> > I'm anyways of the opinion of keeping things simpler and start to evolve when
> > really needed (because often we never really need to evolve). But bah, as I
> > said... this is really not a big deal.
> >
> Oops should have read Nuno's review before replying!
>
> I'd rather we embedded it for now and did the optimization at probe.
> Whilst it's a lot of work per transfer it's not enough to worry about delaying
> it until preenable().  Easy to make that move and take it dynamic when
> driver changes need it.  In meantime, I don't want lots of other drivers
> picking up this pattern when they may never need the complexity of
> making things more dynamic.
>

Noted.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-13 17:25   ` Jonathan Cameron
@ 2024-02-13 19:20     ` David Lechner
  0 siblings, 0 replies; 21+ messages in thread
From: David Lechner @ 2024-02-13 19:20 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Mark Brown, Martin Sperl, David Jander, Jonathan Cameron,
	Michael Hennerich, Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Tue, Feb 13, 2024 at 11:25 AM Jonathan Cameron
<Jonathan.Cameron@huawei.com> wrote:
>
>
> I thought about suggesting splitting this into an initial patch that just does
> the bits without the controller callbacks. Maybe it would work better that way
> with that introduced after the validate and splitting of transfers (so most
> of patches 1 and 2) as a patch 3 prior to the stm32 additions?

Unless anyone else feels the same way, I'm inclined to avoid the extra
work of splitting it up.


> > +static void __spi_unoptimize_message(struct spi_message *msg)
> > +{
> > +     struct spi_controller *ctlr = msg->spi->controller;
> > +
> > +     if (ctlr->unoptimize_message)
> > +             ctlr->unoptimize_message(msg);
> > +
> > +     msg->optimized = false;
> > +     msg->opt_state = NULL;
> > +}
>
> Seems misbalanced that this doesn't take a pre_optimized flag in but
> __spi_optimize does. I'd move handling that to outside the call in both cases.
>
>

Agreed.


> > @@ -4331,10 +4463,15 @@ static int __spi_sync(struct spi_device *spi, struct spi_message *message)
> >               return -ESHUTDOWN;
> >       }
> >
> > -     status = __spi_validate(spi, message);
> > -     if (status != 0)
> > +     status = spi_maybe_optimize_message(spi, message);
> > +     if (status)
> >               return status;
> >
> > +     /*
> > +      * NB: all return paths after this point must ensure that
> > +      * spi_finalize_current_message() is called to avoid leaking resources.
>
> I'm not sure a catch all like that makes sense. Not sufficient to call
> the finer grained spi_maybe_unoptimize_message()  ?

Hmm... this is my bias from a previous fix showing through. Maybe this
comment doesn't belong in this patch. The short answer to your
question is "it's complicated".

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-13 18:55   ` Mark Brown
@ 2024-02-13 19:26     ` David Lechner
  2024-02-13 19:28       ` Mark Brown
  0 siblings, 1 reply; 21+ messages in thread
From: David Lechner @ 2024-02-13 19:26 UTC (permalink / raw)
  To: Mark Brown
  Cc: Martin Sperl, David Jander, Jonathan Cameron, Michael Hennerich,
	Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

On Tue, Feb 13, 2024 at 12:55 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Mon, Feb 12, 2024 at 05:26:41PM -0600, David Lechner wrote:
>
> > +static int __spi_optimize_message(struct spi_device *spi,
> > +                               struct spi_message *msg,
> > +                               bool pre_optimized)
> > +{
> > +     struct spi_controller *ctlr = spi->controller;
> > +     int ret;
> > +
> > +     ret = __spi_validate(spi, msg);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (ctlr->optimize_message) {
> > +             ret = ctlr->optimize_message(msg);
> > +             if (ret)
> > +                     return ret;
> > +     }
> > +
> > +     msg->pre_optimized = pre_optimized;
>
> It would probably be clearer to name the parameter pre_optimising rather
> than pre_optimized, as it is the logic is a bit confusing.  Either that
> or some comments.  A similar issue applies on the cleanup path.

Per Jonathan's suggestion, I plan to remove the parameter from this
function and handle this flag at the call site instead.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/5] spi: add spi_optimize_message() APIs
  2024-02-13 19:26     ` David Lechner
@ 2024-02-13 19:28       ` Mark Brown
  0 siblings, 0 replies; 21+ messages in thread
From: Mark Brown @ 2024-02-13 19:28 UTC (permalink / raw)
  To: David Lechner
  Cc: Martin Sperl, David Jander, Jonathan Cameron, Michael Hennerich,
	Nuno Sá,
	Alain Volmat, Maxime Coquelin, Alexandre Torgue, linux-spi,
	linux-kernel, linux-stm32, linux-arm-kernel, linux-iio

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

On Tue, Feb 13, 2024 at 01:26:02PM -0600, David Lechner wrote:
> On Tue, Feb 13, 2024 at 12:55 PM Mark Brown <broonie@kernel.org> wrote:

> > It would probably be clearer to name the parameter pre_optimising rather
> > than pre_optimized, as it is the logic is a bit confusing.  Either that
> > or some comments.  A similar issue applies on the cleanup path.

> Per Jonathan's suggestion, I plan to remove the parameter from this
> function and handle this flag at the call site instead.

That works too.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-02-13 19:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-12 23:26 [PATCH 0/5] spi: add support for pre-cooking messages David Lechner
2024-02-12 23:26 ` [PATCH 1/5] spi: add spi_optimize_message() APIs David Lechner
2024-02-13  9:53   ` Nuno Sá
2024-02-13 15:38     ` David Lechner
2024-02-13 17:55     ` Mark Brown
2024-02-13 17:25   ` Jonathan Cameron
2024-02-13 19:20     ` David Lechner
2024-02-13 18:55   ` Mark Brown
2024-02-13 19:26     ` David Lechner
2024-02-13 19:28       ` Mark Brown
2024-02-12 23:26 ` [PATCH 2/5] spi: move splitting transfers to spi_optimize_message() David Lechner
2024-02-13 17:35   ` Jonathan Cameron
2024-02-12 23:26 ` [PATCH 3/5] spi: stm32: move splitting transfers to optimize_message David Lechner
2024-02-12 23:26 ` [PATCH 4/5] spi: axi-spi-engine: move message compile " David Lechner
2024-02-12 23:26 ` [PATCH 5/5] iio: adc: ad7380: use spi_optimize_message() David Lechner
2024-02-13  9:51   ` Nuno Sá
2024-02-13 15:27     ` David Lechner
2024-02-13 16:08       ` Nuno Sá
2024-02-13 17:31         ` Jonathan Cameron
2024-02-13 18:59           ` David Lechner
2024-02-13 17:28   ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).