linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations
@ 2020-06-16  0:09 Robin Murphy
  2020-06-16  0:09 ` [PATCH 1/3] spi: bcm3835: Tidy up bcm2835_spi_reset_hw() Robin Murphy
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Robin Murphy @ 2020-06-16  0:09 UTC (permalink / raw)
  To: broonie, nsaenzjulienne, f.fainelli, rjui, sbranden
  Cc: bcm-kernel-feedback-list, linux-spi, linux-rpi-kernel,
	linux-arm-kernel, kernel, lukas, linux-kernel

Hi all,

Although Florian was concerned about a trivial inline check to deal with
shared IRQs adding overhead, the reality is that it would be so small as
to not be worth even thinking about unless the driver was already tuned
to squeeze out every last cycle. And a brief look over the code shows
that that clearly isn't the case.

This is an example of some of the easy low-hanging fruit that jumps out
just from code inspection. Based on disassembly and ARM1176 cycle
timings, patch #2 should save the equivalent of 2-3 shared interrupt
checks off the critical path in all cases, and patch #3 possibly up to
about 100x more. I don't have any means to test these patches, let alone
measure performance, so they're only backed by the principle that less
code - and in particular fewer memory accesses - is almost always
better.

There is almost certainly a *lot* more to be had from careful use of
relaxed I/O accessors, not doing a read-modify-write of CS at every
reset, tweaking the loops further to avoid unnecessary writebacks to
variables, and so on. However since I'm not invested in this personally
I'm not going to pursue it any further; I'm throwing these patches out
as more of a demonstration to back up my original drive-by review
comments, so if anyone want to pick them up and run with them then
please do so.

Robin.


Robin Murphy (3):
  spi: bcm3835: Tidy up bcm2835_spi_reset_hw()
  spi: bcm2835: Micro-optimise IRQ handler
  spi: bcm2835: Micro-optimise FIFO loops

 drivers/spi/spi-bcm2835.c | 45 +++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 23 deletions(-)

-- 
2.23.0.dirty


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] spi: bcm3835: Tidy up bcm2835_spi_reset_hw()
  2020-06-16  0:09 [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations Robin Murphy
@ 2020-06-16  0:09 ` Robin Murphy
  2020-06-16  0:09 ` [PATCH 2/3] spi: bcm2835: Micro-optimise IRQ handler Robin Murphy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Robin Murphy @ 2020-06-16  0:09 UTC (permalink / raw)
  To: broonie, nsaenzjulienne, f.fainelli, rjui, sbranden
  Cc: bcm-kernel-feedback-list, linux-spi, linux-rpi-kernel,
	linux-arm-kernel, kernel, lukas, linux-kernel

It doesn't need a struct spi_controller, and every callsite has
already retrieved the appropriate struct bcm2835_spi, so just pass
that directly.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/spi/spi-bcm2835.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 237bd306c268..524a91e52111 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -335,9 +335,8 @@ static inline void bcm2835_wr_fifo_blind(struct bcm2835_spi *bs, int count)
 	}
 }
 
-static void bcm2835_spi_reset_hw(struct spi_controller *ctlr)
+static void bcm2835_spi_reset_hw(struct bcm2835_spi *bs)
 {
-	struct bcm2835_spi *bs = spi_controller_get_devdata(ctlr);
 	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
 
 	/* Disable SPI interrupts and transfer */
@@ -386,7 +385,7 @@ static irqreturn_t bcm2835_spi_interrupt(int irq, void *dev_id)
 
 	if (!bs->rx_len) {
 		/* Transfer complete - reset SPI HW */
-		bcm2835_spi_reset_hw(ctlr);
+		bcm2835_spi_reset_hw(bs);
 		/* wake up the framework */
 		complete(&ctlr->xfer_completion);
 	}
@@ -607,7 +606,7 @@ static void bcm2835_spi_dma_rx_done(void *data)
 	bcm2835_spi_undo_prologue(bs);
 
 	/* reset fifo and HW */
-	bcm2835_spi_reset_hw(ctlr);
+	bcm2835_spi_reset_hw(bs);
 
 	/* and mark as completed */;
 	complete(&ctlr->xfer_completion);
@@ -641,7 +640,7 @@ static void bcm2835_spi_dma_tx_done(void *data)
 		dmaengine_terminate_async(ctlr->dma_rx);
 
 	bcm2835_spi_undo_prologue(bs);
-	bcm2835_spi_reset_hw(ctlr);
+	bcm2835_spi_reset_hw(bs);
 	complete(&ctlr->xfer_completion);
 }
 
@@ -825,14 +824,14 @@ static int bcm2835_spi_transfer_one_dma(struct spi_controller *ctlr,
 	if (!bs->rx_buf && !bs->tx_dma_active &&
 	    cmpxchg(&bs->rx_dma_active, true, false)) {
 		dmaengine_terminate_async(ctlr->dma_rx);
-		bcm2835_spi_reset_hw(ctlr);
+		bcm2835_spi_reset_hw(bs);
 	}
 
 	/* wait for wakeup in framework */
 	return 1;
 
 err_reset_hw:
-	bcm2835_spi_reset_hw(ctlr);
+	bcm2835_spi_reset_hw(bs);
 	bcm2835_spi_undo_prologue(bs);
 	return ret;
 }
@@ -1074,7 +1073,7 @@ static int bcm2835_spi_transfer_one_poll(struct spi_controller *ctlr,
 	}
 
 	/* Transfer complete - reset SPI HW */
-	bcm2835_spi_reset_hw(ctlr);
+	bcm2835_spi_reset_hw(bs);
 	/* and return without waiting for completion */
 	return 0;
 }
@@ -1182,7 +1181,7 @@ static void bcm2835_spi_handle_err(struct spi_controller *ctlr,
 	bcm2835_spi_undo_prologue(bs);
 
 	/* and reset */
-	bcm2835_spi_reset_hw(ctlr);
+	bcm2835_spi_reset_hw(bs);
 }
 
 static int chip_match_name(struct gpio_chip *chip, void *data)
-- 
2.23.0.dirty


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] spi: bcm2835: Micro-optimise IRQ handler
  2020-06-16  0:09 [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations Robin Murphy
  2020-06-16  0:09 ` [PATCH 1/3] spi: bcm3835: Tidy up bcm2835_spi_reset_hw() Robin Murphy
@ 2020-06-16  0:09 ` Robin Murphy
  2020-06-16  0:09 ` [PATCH 3/3] spi: bcm2835: Micro-optimise FIFO loops Robin Murphy
  2020-07-01 22:24 ` [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations Mark Brown
  3 siblings, 0 replies; 5+ messages in thread
From: Robin Murphy @ 2020-06-16  0:09 UTC (permalink / raw)
  To: broonie, nsaenzjulienne, f.fainelli, rjui, sbranden
  Cc: bcm-kernel-feedback-list, linux-spi, linux-rpi-kernel,
	linux-arm-kernel, kernel, lukas, linux-kernel

The IRQ handler only needs the struct spi_controller for the sake of
the completion at the end of a transfer. Passing the struct bcm2835_spi
directly as the IRQ data allows that level of indirection to be pushed
into the completion path for the reverse lookup, and avoided entirely
in all other cases.

This saves one explicit load in the critical path, plus (for a GCC 8.3
build) two registers worth of stack frame overhead.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/spi/spi-bcm2835.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 524a91e52111..aec70ac8911e 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -86,6 +86,7 @@ MODULE_PARM_DESC(polling_limit_us,
  * @clk: core clock, divided to calculate serial clock
  * @irq: interrupt, signals TX FIFO empty or RX FIFO ¾ full
  * @tfr: SPI transfer currently processed
+ * @ctlr: SPI controller reverse lookup
  * @tx_buf: pointer whence next transmitted byte is read
  * @rx_buf: pointer where next received byte is written
  * @tx_len: remaining bytes to transmit
@@ -125,6 +126,7 @@ struct bcm2835_spi {
 	struct clk *clk;
 	int irq;
 	struct spi_transfer *tfr;
+	struct spi_controller *ctlr;
 	const u8 *tx_buf;
 	u8 *rx_buf;
 	int tx_len;
@@ -362,8 +364,7 @@ static void bcm2835_spi_reset_hw(struct bcm2835_spi *bs)
 
 static irqreturn_t bcm2835_spi_interrupt(int irq, void *dev_id)
 {
-	struct spi_controller *ctlr = dev_id;
-	struct bcm2835_spi *bs = spi_controller_get_devdata(ctlr);
+	struct bcm2835_spi *bs = dev_id;
 	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
 
 	/*
@@ -387,7 +388,7 @@ static irqreturn_t bcm2835_spi_interrupt(int irq, void *dev_id)
 		/* Transfer complete - reset SPI HW */
 		bcm2835_spi_reset_hw(bs);
 		/* wake up the framework */
-		complete(&ctlr->xfer_completion);
+		complete(&bs->ctlr->xfer_completion);
 	}
 
 	return IRQ_HANDLED;
@@ -1310,6 +1311,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 	ctlr->dev.of_node = pdev->dev.of_node;
 
 	bs = spi_controller_get_devdata(ctlr);
+	bs->ctlr = ctlr;
 
 	bs->regs = devm_platform_ioremap_resource(pdev, 0);
 	if (IS_ERR(bs->regs)) {
@@ -1344,7 +1346,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 		   BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);
 
 	err = devm_request_irq(&pdev->dev, bs->irq, bcm2835_spi_interrupt, 0,
-			       dev_name(&pdev->dev), ctlr);
+			       dev_name(&pdev->dev), bs);
 	if (err) {
 		dev_err(&pdev->dev, "could not request IRQ: %d\n", err);
 		goto out_dma_release;
-- 
2.23.0.dirty


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] spi: bcm2835: Micro-optimise FIFO loops
  2020-06-16  0:09 [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations Robin Murphy
  2020-06-16  0:09 ` [PATCH 1/3] spi: bcm3835: Tidy up bcm2835_spi_reset_hw() Robin Murphy
  2020-06-16  0:09 ` [PATCH 2/3] spi: bcm2835: Micro-optimise IRQ handler Robin Murphy
@ 2020-06-16  0:09 ` Robin Murphy
  2020-07-01 22:24 ` [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations Mark Brown
  3 siblings, 0 replies; 5+ messages in thread
From: Robin Murphy @ 2020-06-16  0:09 UTC (permalink / raw)
  To: broonie, nsaenzjulienne, f.fainelli, rjui, sbranden
  Cc: bcm-kernel-feedback-list, linux-spi, linux-rpi-kernel,
	linux-arm-kernel, kernel, lukas, linux-kernel

The blind and counted loops are always called with nonzero count, so
convert them to do-while loops that lead to slightly more efficient
code generation. With GCC 8.3 this shaves off 1-2 instructions per
iteration in each case.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/spi/spi-bcm2835.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index aec70ac8911e..e9a91adf03eb 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -245,13 +245,13 @@ static inline void bcm2835_rd_fifo_count(struct bcm2835_spi *bs, int count)
 
 	bs->rx_len -= count;
 
-	while (count > 0) {
+	do {
 		val = bcm2835_rd(bs, BCM2835_SPI_FIFO);
 		len = min(count, 4);
 		memcpy(bs->rx_buf, &val, len);
 		bs->rx_buf += len;
 		count -= 4;
-	}
+	} while (count > 0);
 }
 
 /**
@@ -271,7 +271,7 @@ static inline void bcm2835_wr_fifo_count(struct bcm2835_spi *bs, int count)
 
 	bs->tx_len -= count;
 
-	while (count > 0) {
+	do {
 		if (bs->tx_buf) {
 			len = min(count, 4);
 			memcpy(&val, bs->tx_buf, len);
@@ -281,7 +281,7 @@ static inline void bcm2835_wr_fifo_count(struct bcm2835_spi *bs, int count)
 		}
 		bcm2835_wr(bs, BCM2835_SPI_FIFO, val);
 		count -= 4;
-	}
+	} while (count > 0);
 }
 
 /**
@@ -310,12 +310,11 @@ static inline void bcm2835_rd_fifo_blind(struct bcm2835_spi *bs, int count)
 	count = min(count, bs->rx_len);
 	bs->rx_len -= count;
 
-	while (count) {
+	do {
 		val = bcm2835_rd(bs, BCM2835_SPI_FIFO);
 		if (bs->rx_buf)
 			*bs->rx_buf++ = val;
-		count--;
-	}
+	} while (--count);
 }
 
 /**
@@ -330,11 +329,10 @@ static inline void bcm2835_wr_fifo_blind(struct bcm2835_spi *bs, int count)
 	count = min(count, bs->tx_len);
 	bs->tx_len -= count;
 
-	while (count) {
+	do {
 		val = bs->tx_buf ? *bs->tx_buf++ : 0;
 		bcm2835_wr(bs, BCM2835_SPI_FIFO, val);
-		count--;
-	}
+	} while (--count);
 }
 
 static void bcm2835_spi_reset_hw(struct bcm2835_spi *bs)
-- 
2.23.0.dirty


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations
  2020-06-16  0:09 [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations Robin Murphy
                   ` (2 preceding siblings ...)
  2020-06-16  0:09 ` [PATCH 3/3] spi: bcm2835: Micro-optimise FIFO loops Robin Murphy
@ 2020-07-01 22:24 ` Mark Brown
  3 siblings, 0 replies; 5+ messages in thread
From: Mark Brown @ 2020-07-01 22:24 UTC (permalink / raw)
  To: f.fainelli, rjui, nsaenzjulienne, sbranden, Robin Murphy
  Cc: linux-arm-kernel, bcm-kernel-feedback-list, linux-kernel, lukas,
	kernel, linux-rpi-kernel, linux-spi

On Tue, 16 Jun 2020 01:09:26 +0100, Robin Murphy wrote:
> Although Florian was concerned about a trivial inline check to deal with
> shared IRQs adding overhead, the reality is that it would be so small as
> to not be worth even thinking about unless the driver was already tuned
> to squeeze out every last cycle. And a brief look over the code shows
> that that clearly isn't the case.
> 
> This is an example of some of the easy low-hanging fruit that jumps out
> just from code inspection. Based on disassembly and ARM1176 cycle
> timings, patch #2 should save the equivalent of 2-3 shared interrupt
> checks off the critical path in all cases, and patch #3 possibly up to
> about 100x more. I don't have any means to test these patches, let alone
> measure performance, so they're only backed by the principle that less
> code - and in particular fewer memory accesses - is almost always
> better.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/3] spi: bcm3835: Tidy up bcm2835_spi_reset_hw()
      commit: ac4648b5d866f98feef4525ae8734972359e4edd
[2/3] spi: bcm2835: Micro-optimise IRQ handler
      commit: afe7e36360f4c981fc03ef07a81cb4ce3d567325
[3/3] spi: bcm2835: Micro-optimise FIFO loops
      commit: 26751de25d255eab7132a8024a893609456996e6

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-07-01 22:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-16  0:09 [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations Robin Murphy
2020-06-16  0:09 ` [PATCH 1/3] spi: bcm3835: Tidy up bcm2835_spi_reset_hw() Robin Murphy
2020-06-16  0:09 ` [PATCH 2/3] spi: bcm2835: Micro-optimise IRQ handler Robin Murphy
2020-06-16  0:09 ` [PATCH 3/3] spi: bcm2835: Micro-optimise FIFO loops Robin Murphy
2020-07-01 22:24 ` [PATCH 0/3] spi: bcm2835: Interrupt-handling optimisations Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).