All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Raspberry Pi spi0 improvements
@ 2018-11-08  7:06 Lukas Wunner
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2018-11-08  7:06 UTC (permalink / raw)
  To: Mark Brown, Eric Anholt, Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

Here's a first batch of improvements for the spi0 master on the
Raspberry Pi.  The meat of the series is in its last two patches:

* Patch [6/7] allows DMA for transfer buffers starting at an offset not a
  multiple of 4.  This overcomes a limitation affecting Ethernet drivers
  such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
  deliberately unaligned receive buffers.

* Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
  is known to contain data, or the TX FIFO when it is known to have free
  space.

The preceding patches fix rarely encountered bugs, remove obsolete code
and add documentation.

The series has been tested extensively on the "Revolution Pi" family of
open source PLCs (https://revolution.kunbus.com/), but further testing
would be welcome to raise the confidence.

Thanks,

Lukas


Lukas Wunner (7):
  spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
  spi: bcm2835: Fix book-keeping of DMA termination
  spi: bcm2835: Fix race on DMA termination
  spi: bcm2835: Drop unused code for native Chip Select
  spi: bcm2835: Document struct bcm2835_spi
  spi: bcm2835: Overcome sglist entry length limitation
  spi: bcm2835: Speed up FIFO access if fill level is known

 drivers/spi/spi-bcm2835.c | 478 ++++++++++++++++++++++++++------------
 1 file changed, 334 insertions(+), 144 deletions(-)

-- 
2.19.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
  2018-11-08  7:06   ` [PATCH 4/7] spi: bcm2835: Drop unused code for native Chip Select Lukas Wunner
@ 2018-11-08  7:06   ` Lukas Wunner
  2018-11-08  7:06   ` [PATCH 3/7] spi: bcm2835: Fix race on DMA termination Lukas Wunner
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Lukas Wunner @ 2018-11-08  7:06 UTC (permalink / raw)
  To: Mark Brown, Eric Anholt, Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

The IRQ handler bcm2835_spi_interrupt() first reads as much as possible
from the RX FIFO, then writes as much as possible to the TX FIFO.
Afterwards it decides whether the transfer is finished by checking if
the TX FIFO is empty.

If very few bytes were written to the TX FIFO, they may already have
been transmitted by the time the FIFO's emptiness is checked.  As a
result, the transfer will be declared finished and the chip will be
reset without reading the corresponding received bytes from the RX FIFO.

The odds of this happening increase with a high clock frequency (such
that the TX FIFO drains quickly) and either passing "threadirqs" on the
command line or enabling CONFIG_PREEMPT_RT_BASE (such that the IRQ
handler may be preempted between filling the TX FIFO and checking its
emptiness).

Fix by instead checking whether rx_len has reached zero, which means
that the transfer has been received in full.  This is also more
efficient as it avoids one bus read access per interrupt.  Note that
bcm2835_spi_transfer_one_poll() likewise uses rx_len to determine
whether the transfer has finished.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Fixes: e34ff011c70e ("spi: bcm2835: move to the transfer_one driver model")
Cc: stable@vger.kernel.org # v4.1+
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi-bcm2835.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index f35cc10772f6..fa7844a956c7 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -155,8 +155,7 @@ static irqreturn_t bcm2835_spi_interrupt(int irq, void *dev_id)
 	/* Write as many bytes as possible to FIFO */
 	bcm2835_wr_fifo(bs);
 
-	/* based on flags decide if we can finish the transfer */
-	if (bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_DONE) {
+	if (!bs->rx_len) {
 		/* Transfer complete - reset SPI HW */
 		bcm2835_spi_reset_hw(master);
 		/* wake up the framework */
-- 
2.19.1


_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] spi: bcm2835: Fix book-keeping of DMA termination
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-11-08  7:06   ` [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known Lukas Wunner
@ 2018-11-08  7:06   ` Lukas Wunner
  2018-11-08  7:06   ` [PATCH 5/7] spi: bcm2835: Document struct bcm2835_spi Lukas Wunner
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Lukas Wunner @ 2018-11-08  7:06 UTC (permalink / raw)
  To: Mark Brown, Eric Anholt, Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

If submission of a DMA TX transfer succeeds but submission of the
corresponding RX transfer does not, the BCM2835 SPI driver terminates
the TX transfer but neglects to reset the dma_pending flag to false.

Thus, if the next transfer uses interrupt mode (because it is shorter
than BCM2835_SPI_DMA_MIN_LENGTH) and runs into a timeout,
dmaengine_terminate_all() will be called both for TX (once more) and
for RX (which was never started in the first place).  Fix it.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Fixes: 3ecd37edaa2a ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
Cc: stable@vger.kernel.org # v4.2+
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi-bcm2835.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index fa7844a956c7..594e9712ecbc 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -341,6 +341,7 @@ static int bcm2835_spi_transfer_one_dma(struct spi_master *master,
 	if (ret) {
 		/* need to reset on errors */
 		dmaengine_terminate_all(master->dma_tx);
+		bs->dma_pending = false;
 		bcm2835_spi_reset_hw(master);
 		return ret;
 	}
-- 
2.19.1


_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] spi: bcm2835: Fix race on DMA termination
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
  2018-11-08  7:06   ` [PATCH 4/7] spi: bcm2835: Drop unused code for native Chip Select Lukas Wunner
  2018-11-08  7:06   ` [PATCH 1/7] spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode Lukas Wunner
@ 2018-11-08  7:06   ` Lukas Wunner
  2018-11-08  7:06   ` [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known Lukas Wunner
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Lukas Wunner @ 2018-11-08  7:06 UTC (permalink / raw)
  To: Mark Brown, Eric Anholt, Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

If a DMA transfer finishes orderly right when spi_transfer_one_message()
determines that it has timed out, the callbacks bcm2835_spi_dma_done()
and bcm2835_spi_handle_err() race to call dmaengine_terminate_all(),
potentially leading to double termination.

Prevent by atomically changing the dma_pending flag before calling
dmaengine_terminate_all().

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Fixes: 3ecd37edaa2a ("spi: bcm2835: enable dma modes for transfers meeting certain conditions")
Cc: stable@vger.kernel.org # v4.2+
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi-bcm2835.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 594e9712ecbc..774161bbcb2e 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -232,10 +232,9 @@ static void bcm2835_spi_dma_done(void *data)
 	 * is called the tx-dma must have finished - can't get to this
 	 * situation otherwise...
 	 */
-	dmaengine_terminate_all(master->dma_tx);
-
-	/* mark as no longer pending */
-	bs->dma_pending = 0;
+	if (cmpxchg(&bs->dma_pending, true, false)) {
+		dmaengine_terminate_all(master->dma_tx);
+	}
 
 	/* and mark as completed */;
 	complete(&master->xfer_completion);
@@ -617,10 +616,9 @@ static void bcm2835_spi_handle_err(struct spi_master *master,
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
 
 	/* if an error occurred and we have an active dma, then terminate */
-	if (bs->dma_pending) {
+	if (cmpxchg(&bs->dma_pending, true, false)) {
 		dmaengine_terminate_all(master->dma_tx);
 		dmaengine_terminate_all(master->dma_rx);
-		bs->dma_pending = 0;
 	}
 	/* and reset */
 	bcm2835_spi_reset_hw(master);
-- 
2.19.1


_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] spi: bcm2835: Drop unused code for native Chip Select
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
@ 2018-11-08  7:06   ` Lukas Wunner
       [not found]     ` <a24869503ed4e867b11c66c8615a4d5cddb3b2b5.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
  2018-11-08  7:06   ` [PATCH 1/7] spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode Lukas Wunner
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2018-11-08  7:06 UTC (permalink / raw)
  To: Mark Brown, Eric Anholt, Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

Commit a30a555d7435 ("spi: bcm2835: transform native-cs to gpio-cs on
first spi_setup") disabled the use of hardware-controlled native Chip
Select in favour of software-controlled GPIO Chip Select but left code
to support the former untouched.  Remove it to simplify the driver and
ease the addition of new features and further optimizations.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi-bcm2835.c | 96 ++++++---------------------------------
 1 file changed, 13 insertions(+), 83 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 774161bbcb2e..b2febd8e806e 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -172,28 +172,16 @@ static int bcm2835_spi_transfer_one_irq(struct spi_master *master,
 {
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
 
-	/* fill in fifo if we have gpio-cs
-	 * note that there have been rare events where the native-CS
-	 * flapped for <1us which may change the behaviour
-	 * with gpio-cs this does not happen, so it is implemented
-	 * only for this case
-	 */
-	if (gpio_is_valid(spi->cs_gpio)) {
-		/* enable HW block, but without interrupts enabled
-		 * this would triggern an immediate interrupt
-		 */
-		bcm2835_wr(bs, BCM2835_SPI_CS,
-			   cs | BCM2835_SPI_CS_TA);
-		/* fill in tx fifo as much as possible */
-		bcm2835_wr_fifo(bs);
-	}
-
 	/*
-	 * Enable the HW block. This will immediately trigger a DONE (TX
-	 * empty) interrupt, upon which we will fill the TX FIFO with the
-	 * first TX bytes. Pre-filling the TX FIFO here to avoid the
-	 * interrupt doesn't work:-(
+	 * Enable HW block, but with interrupts still disabled.
+	 * Otherwise the empty TX FIFO would immediately trigger an interrupt.
 	 */
+	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
+
+	/* fill TX FIFO as much as possible */
+	bcm2835_wr_fifo(bs);
+
+	/* enable interrupts */
 	cs |= BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD | BCM2835_SPI_CS_TA;
 	bcm2835_wr(bs, BCM2835_SPI_CS, cs);
 
@@ -356,10 +344,6 @@ static bool bcm2835_spi_can_dma(struct spi_master *master,
 				struct spi_device *spi,
 				struct spi_transfer *tfr)
 {
-	/* only run for gpio_cs */
-	if (!gpio_is_valid(spi->cs_gpio))
-		return false;
-
 	/* we start DMA efforts only on bigger transfers */
 	if (tfr->len < BCM2835_SPI_DMA_MIN_LENGTH)
 		return false;
@@ -559,12 +543,12 @@ static int bcm2835_spi_transfer_one(struct spi_master *master,
 	else
 		cs &= ~BCM2835_SPI_CS_REN;
 
-	/* for gpio_cs set dummy CS so that no HW-CS get changed
-	 * we can not run this in bcm2835_spi_set_cs, as it does
-	 * not get called for cs_gpio cases, so we need to do it here
+	/*
+	 * The driver always uses software-controlled GPIO Chip Select.
+	 * Set the hardware-controlled native Chip Select to an invalid
+	 * value to prevent it from interfering.
 	 */
-	if (gpio_is_valid(spi->cs_gpio) || (spi->mode & SPI_NO_CS))
-		cs |= BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01;
+	cs |= BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01;
 
 	/* set transmit buffers and length */
 	bs->tx_buf = tfr->tx_buf;
@@ -624,59 +608,6 @@ static void bcm2835_spi_handle_err(struct spi_master *master,
 	bcm2835_spi_reset_hw(master);
 }
 
-static void bcm2835_spi_set_cs(struct spi_device *spi, bool gpio_level)
-{
-	/*
-	 * we can assume that we are "native" as per spi_set_cs
-	 *   calling us ONLY when cs_gpio is not set
-	 * we can also assume that we are CS < 3 as per bcm2835_spi_setup
-	 *   we would not get called because of error handling there.
-	 * the level passed is the electrical level not enabled/disabled
-	 *   so it has to get translated back to enable/disable
-	 *   see spi_set_cs in spi.c for the implementation
-	 */
-
-	struct spi_master *master = spi->master;
-	struct bcm2835_spi *bs = spi_master_get_devdata(master);
-	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
-	bool enable;
-
-	/* calculate the enable flag from the passed gpio_level */
-	enable = (spi->mode & SPI_CS_HIGH) ? gpio_level : !gpio_level;
-
-	/* set flags for "reverse" polarity in the registers */
-	if (spi->mode & SPI_CS_HIGH) {
-		/* set the correct CS-bits */
-		cs |= BCM2835_SPI_CS_CSPOL;
-		cs |= BCM2835_SPI_CS_CSPOL0 << spi->chip_select;
-	} else {
-		/* clean the CS-bits */
-		cs &= ~BCM2835_SPI_CS_CSPOL;
-		cs &= ~(BCM2835_SPI_CS_CSPOL0 << spi->chip_select);
-	}
-
-	/* select the correct chip_select depending on disabled/enabled */
-	if (enable) {
-		/* set cs correctly */
-		if (spi->mode & SPI_NO_CS) {
-			/* use the "undefined" chip-select */
-			cs |= BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01;
-		} else {
-			/* set the chip select */
-			cs &= ~(BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01);
-			cs |= spi->chip_select;
-		}
-	} else {
-		/* disable CSPOL which puts HW-CS into deselected state */
-		cs &= ~BCM2835_SPI_CS_CSPOL;
-		/* use the "undefined" chip-select as precaution */
-		cs |= BCM2835_SPI_CS_CS_10 | BCM2835_SPI_CS_CS_01;
-	}
-
-	/* finally set the calculated flags in SPI_CS */
-	bcm2835_wr(bs, BCM2835_SPI_CS, cs);
-}
-
 static int chip_match_name(struct gpio_chip *chip, void *data)
 {
 	return !strcmp(chip->label, data);
@@ -748,7 +679,6 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 	master->bits_per_word_mask = SPI_BPW_MASK(8);
 	master->num_chipselect = 3;
 	master->setup = bcm2835_spi_setup;
-	master->set_cs = bcm2835_spi_set_cs;
 	master->transfer_one = bcm2835_spi_transfer_one;
 	master->handle_err = bcm2835_spi_handle_err;
 	master->prepare_message = bcm2835_spi_prepare_message;
-- 
2.19.1


_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] spi: bcm2835: Document struct bcm2835_spi
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-11-08  7:06   ` [PATCH 2/7] spi: bcm2835: Fix book-keeping of DMA termination Lukas Wunner
@ 2018-11-08  7:06   ` Lukas Wunner
  2018-11-08  7:06   ` [PATCH 6/7] spi: bcm2835: Overcome sglist entry length limitation Lukas Wunner
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Lukas Wunner @ 2018-11-08  7:06 UTC (permalink / raw)
  To: Mark Brown, Eric Anholt, Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

Document the driver's data structure to lower the barrier to entry for
contributors.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi-bcm2835.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index b2febd8e806e..9b9b9926a956 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -80,6 +80,17 @@
 
 #define DRV_NAME	"spi-bcm2835"
 
+/**
+ * struct bcm2835_spi - BCM2835 SPI controller
+ * @regs: base address of register map
+ * @clk: core clock, divided to calculate serial clock
+ * @irq: interrupt, signals TX FIFO empty or RX FIFO ¾ full
+ * @tx_buf: pointer whence next transmitted byte is read
+ * @rx_buf: pointer where next received byte is written
+ * @tx_len: remaining bytes to transmit
+ * @rx_len: remaining bytes to receive
+ * @dma_pending: whether a DMA transfer is in progress
+ */
 struct bcm2835_spi {
 	void __iomem *regs;
 	struct clk *clk;
-- 
2.19.1


_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] spi: bcm2835: Overcome sglist entry length limitation
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-11-08  7:06   ` [PATCH 5/7] spi: bcm2835: Document struct bcm2835_spi Lukas Wunner
@ 2018-11-08  7:06   ` Lukas Wunner
       [not found]     ` <eb5ce210b06fb68580961038412f9499c3e56a76.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
  2018-11-10  9:13   ` [PATCH 0/7] Raspberry Pi spi0 improvements kernel-TqfNSX0MhmxHKSADF0wUEw
  2018-11-14  5:12   ` Florian Fainelli
  8 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2018-11-08  7:06 UTC (permalink / raw)
  To: Mark Brown, Eric Anholt, Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

When in DMA mode, the BCM2835 SPI controller requires that the FIFO is
accessed in 4 byte chunks.  This rule is not fulfilled if a transfer
consists of multiple sglist entries, one per page, and the first entry
starts in the middle of a page with an offset not a multiple of 4.

The driver currently falls back to programmed I/O for such transfers,
incurring a significant performance penalty.

Overcome this hardware limitation by transferring the first few bytes of
a transfer without DMA such that the remainder of the first sglist entry
becomes a multiple of 4.  Specifics are provided in kerneldoc comments.

An alternative approach would have been to split transfers in the
->prepare_message hook, but this may necessitate two transfers per page,
defeating the goal of clustering multiple pages together in a single
transfer for efficiency.  E.g. if the first TX sglist entry's length is
23 and the first RX's is 40, the first transfer would send and receive
23 bytes, the second 40 - 23 = 17 bytes, the third 4096 - 17 = 4079
bytes, the fourth 4096 - 4079 = 17 bytes and so on.  In other words,
O(n) transfers are necessary (n = number of sglist entries), whereas
the algorithm implemented herein only requires O(1) additional work.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi-bcm2835.c | 291 +++++++++++++++++++++++++++++++-------
 1 file changed, 242 insertions(+), 49 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 9b9b9926a956..36719d2cc12d 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -85,20 +85,30 @@
  * @regs: base address of register map
  * @clk: core clock, divided to calculate serial clock
  * @irq: interrupt, signals TX FIFO empty or RX FIFO ¾ full
+ * @tfr: SPI transfer currently processed
  * @tx_buf: pointer whence next transmitted byte is read
  * @rx_buf: pointer where next received byte is written
  * @tx_len: remaining bytes to transmit
  * @rx_len: remaining bytes to receive
+ * @tx_prologue: bytes transmitted without DMA if first TX sglist entry's
+ *	length is not a multiple of 4 (to overcome hardware limitation)
+ * @rx_prologue: bytes received without DMA if first RX sglist entry's
+ *	length is not a multiple of 4 (to overcome hardware limitation)
+ * @tx_spillover: whether @tx_prologue spills over to second TX sglist entry
  * @dma_pending: whether a DMA transfer is in progress
  */
 struct bcm2835_spi {
 	void __iomem *regs;
 	struct clk *clk;
 	int irq;
+	struct spi_transfer *tfr;
 	const u8 *tx_buf;
 	u8 *rx_buf;
 	int tx_len;
 	int rx_len;
+	int tx_prologue;
+	int rx_prologue;
+	bool tx_spillover;
 	bool dma_pending;
 };
 
@@ -137,6 +147,72 @@ static inline void bcm2835_wr_fifo(struct bcm2835_spi *bs)
 	}
 }
 
+/**
+ * bcm2835_rd_fifo_count() - blindly read exactly @count bytes from RX FIFO
+ * @bs: BCM2835 SPI controller
+ * @count: bytes to read from RX FIFO
+ *
+ * The caller must ensure that @bs->rx_len is greater than or equal to @count,
+ * that the RX FIFO contains at least @count bytes and that the DMA Enable flag
+ * in the CS register is set (such that a read from the FIFO register receives
+ * 32-bit instead of just 8-bit).
+ */
+static inline void bcm2835_rd_fifo_count(struct bcm2835_spi *bs, int count)
+{
+	u32 val;
+
+	bs->rx_len -= count;
+
+	while (count > 0) {
+		val = bcm2835_rd(bs, BCM2835_SPI_FIFO);
+		if (bs->rx_buf) {
+			int len = min(count, 4);
+			memcpy(bs->rx_buf, &val, len);
+			bs->rx_buf += len;
+		}
+		count -= 4;
+	}
+}
+
+/**
+ * bcm2835_wr_fifo_count() - blindly write exactly @count bytes to TX FIFO
+ * @bs: BCM2835 SPI controller
+ * @count: bytes to write to TX FIFO
+ *
+ * The caller must ensure that @bs->tx_len is greater than or equal to @count,
+ * that the TX FIFO can accommodate @count bytes and that the DMA Enable flag
+ * in the CS register is set (such that a write to the FIFO register transmits
+ * 32-bit instead of just 8-bit).
+ */
+static inline void bcm2835_wr_fifo_count(struct bcm2835_spi *bs, int count)
+{
+	u32 val;
+
+	bs->tx_len -= count;
+
+	while (count > 0) {
+		if (bs->tx_buf) {
+			int len = min(count, 4);
+			memcpy(&val, bs->tx_buf, len);
+			bs->tx_buf += len;
+		} else {
+			val = 0;
+		}
+		bcm2835_wr(bs, BCM2835_SPI_FIFO, val);
+		count -= 4;
+	}
+}
+
+/**
+ * bcm2835_wait_tx_fifo_empty() - busy-wait for TX FIFO to empty
+ * @bs: BCM2835 SPI controller
+ */
+static inline void bcm2835_wait_tx_fifo_empty(struct bcm2835_spi *bs)
+{
+	while (!(bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_DONE))
+		cpu_relax();
+}
+
 static void bcm2835_spi_reset_hw(struct spi_master *master)
 {
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
@@ -209,15 +285,161 @@ static int bcm2835_spi_transfer_one_irq(struct spi_master *master,
  * the main one being that DMA transfers are limited to 16 bit
  * (so 0 to 65535 bytes) by the SPI HW due to BCM2835_SPI_DLEN
  *
- * also we currently assume that the scatter-gather fragments are
- * all multiple of 4 (except the last) - otherwise we would need
- * to reset the FIFO before subsequent transfers...
- * this also means that tx/rx transfers sg's need to be of equal size!
- *
  * there may be a few more border-cases we may need to address as well
  * but unfortunately this would mean splitting up the scatter-gather
  * list making it slightly unpractical...
  */
+
+/**
+ * bcm2835_spi_transfer_prologue() - transfer first few bytes without DMA
+ * @master: SPI master
+ * @tfr: SPI transfer
+ * @bs: BCM2835 SPI controller
+ * @cs: CS register
+ *
+ * A limitation in DMA mode is that the FIFO must be accessed in 4 byte chunks.
+ * Only the final write access is permitted to transmit less than 4 bytes, the
+ * SPI controller deduces its intended size from the DLEN register.
+ *
+ * If a TX or RX sglist contains multiple entries, one per page, and the first
+ * entry starts in the middle of a page, that first entry's length may not be
+ * a multiple of 4.  Subsequent entries are fine because they span an entire
+ * page, hence do have a length that's a multiple of 4.
+ *
+ * This cannot happen with kmalloc'ed buffers (which is what most clients use)
+ * because they are contiguous in physical memory and therefore not split on
+ * page boundaries by spi_map_buf().  But it *can* happen with vmalloc'ed
+ * buffers.
+ *
+ * The DMA engine is incapable of combining sglist entries into a continuous
+ * stream of 4 byte chunks, it treats every entry separately:  A TX entry is
+ * rounded up a to a multiple of 4 bytes by transmitting surplus bytes, an RX
+ * entry is rounded up by throwing away received bytes.
+ *
+ * Overcome this limitation by transferring the first few bytes without DMA:
+ * E.g. if the first TX sglist entry's length is 23 and the first RX's is 42,
+ * write 3 bytes to the TX FIFO but read only 2 bytes from the RX FIFO.
+ * The residue of 1 byte in the RX FIFO is picked up by DMA.  Together with
+ * the rest of the first RX sglist entry it makes up a multiple of 4 bytes.
+ *
+ * Should the RX prologue be larger, say, 3 vis-à-vis a TX prologue of 1,
+ * write 1 + 4 = 5 bytes to the TX FIFO and read 3 bytes from the RX FIFO.
+ * Caution, the additional 4 bytes spill over to the second TX sglist entry
+ * if the length of the first is *exactly* 1.
+ *
+ * At most 6 bytes are written and at most 3 bytes read.  Do we know the
+ * transfer has this many bytes?  Yes, see BCM2835_SPI_DMA_MIN_LENGTH.
+ *
+ * The FIFO is normally accessed with 8-bit width by the CPU and 32-bit width
+ * by the DMA engine.  Toggling the DMA Enable flag in the CS register switches
+ * the width but also garbles the FIFO's contents.  The prologue must therefore
+ * be transmitted in 32-bit width to ensure that the following DMA transfer can
+ * pick up the residue in the RX FIFO in ungarbled form.
+ */
+static void bcm2835_spi_transfer_prologue(struct spi_master *master,
+					  struct spi_transfer *tfr,
+					  struct bcm2835_spi *bs,
+					  u32 cs)
+{
+	int tx_remaining;
+
+	bs->tfr		 = tfr;
+	bs->tx_prologue  = 0;
+	bs->rx_prologue  = 0;
+	bs->tx_spillover = false;
+
+	if (!sg_is_last(&tfr->tx_sg.sgl[0]))
+		bs->tx_prologue = sg_dma_len(&tfr->tx_sg.sgl[0]) & 3;
+
+	if (!sg_is_last(&tfr->rx_sg.sgl[0])) {
+		bs->rx_prologue = sg_dma_len(&tfr->rx_sg.sgl[0]) & 3;
+
+		if (bs->rx_prologue > bs->tx_prologue) {
+			if (sg_is_last(&tfr->tx_sg.sgl[0])) {
+				bs->tx_prologue  = bs->rx_prologue;
+			} else {
+				bs->tx_prologue += 4;
+				bs->tx_spillover =
+					!(sg_dma_len(&tfr->tx_sg.sgl[0]) & ~3);
+			}
+		}
+	}
+
+	/* rx_prologue > 0 implies tx_prologue > 0, so check only the latter */
+	if (!bs->tx_prologue)
+		return;
+
+	/* Write and read RX prologue.  Adjust first entry in RX sglist. */
+	if (bs->rx_prologue) {
+		bcm2835_wr(bs, BCM2835_SPI_DLEN, bs->rx_prologue);
+		bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA
+						  | BCM2835_SPI_CS_DMAEN);
+		bcm2835_wr_fifo_count(bs, bs->rx_prologue);
+		bcm2835_wait_tx_fifo_empty(bs);
+		bcm2835_rd_fifo_count(bs, bs->rx_prologue);
+		bcm2835_spi_reset_hw(master);
+
+		dma_sync_sg_for_device(master->dma_rx->device->dev,
+				       tfr->rx_sg.sgl, 1, DMA_FROM_DEVICE);
+
+		tfr->rx_sg.sgl[0].dma_address += bs->rx_prologue;
+		tfr->rx_sg.sgl[0].length      -= bs->rx_prologue;
+	}
+
+	/*
+	 * Write remaining TX prologue.  Adjust first entry in TX sglist.
+	 * Also adjust second entry if prologue spills over to it.
+	 */
+	tx_remaining = bs->tx_prologue - bs->rx_prologue;
+	if (tx_remaining) {
+		bcm2835_wr(bs, BCM2835_SPI_DLEN, tx_remaining);
+		bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA
+						  | BCM2835_SPI_CS_DMAEN);
+		bcm2835_wr_fifo_count(bs, tx_remaining);
+		bcm2835_wait_tx_fifo_empty(bs);
+		bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_CLEAR_TX);
+	}
+
+	if (likely(!bs->tx_spillover)) {
+		tfr->tx_sg.sgl[0].dma_address += bs->tx_prologue;
+		tfr->tx_sg.sgl[0].length      -= bs->tx_prologue;
+	} else {
+		tfr->tx_sg.sgl[0].length       = 0;
+		tfr->tx_sg.sgl[1].dma_address += 4;
+		tfr->tx_sg.sgl[1].length      -= 4;
+	}
+}
+
+/**
+ * bcm2835_spi_undo_prologue() - reconstruct original sglist state
+ * @bs: BCM2835 SPI controller
+ *
+ * Undo changes which were made to an SPI transfer's sglist when transmitting
+ * the prologue.  This is necessary to ensure the same memory ranges are
+ * unmapped that were originally mapped.
+ */
+static void bcm2835_spi_undo_prologue(struct bcm2835_spi *bs)
+{
+	struct spi_transfer *tfr = bs->tfr;
+
+	if (!bs->tx_prologue)
+		return;
+
+	if (bs->rx_prologue) {
+		tfr->rx_sg.sgl[0].dma_address -= bs->rx_prologue;
+		tfr->rx_sg.sgl[0].length      += bs->rx_prologue;
+	}
+
+	if (likely(!bs->tx_spillover)) {
+		tfr->tx_sg.sgl[0].dma_address -= bs->tx_prologue;
+		tfr->tx_sg.sgl[0].length      += bs->tx_prologue;
+	} else {
+		tfr->tx_sg.sgl[0].length       = bs->tx_prologue - 4;
+		tfr->tx_sg.sgl[1].dma_address -= 4;
+		tfr->tx_sg.sgl[1].length      += 4;
+	}
+}
+
 static void bcm2835_spi_dma_done(void *data)
 {
 	struct spi_master *master = data;
@@ -233,6 +455,7 @@ static void bcm2835_spi_dma_done(void *data)
 	 */
 	if (cmpxchg(&bs->dma_pending, true, false)) {
 		dmaengine_terminate_all(master->dma_tx);
+		bcm2835_spi_undo_prologue(bs);
 	}
 
 	/* and mark as completed */;
@@ -283,20 +506,6 @@ static int bcm2835_spi_prepare_sg(struct spi_master *master,
 	return dma_submit_error(cookie);
 }
 
-static inline int bcm2835_check_sg_length(struct sg_table *sgt)
-{
-	int i;
-	struct scatterlist *sgl;
-
-	/* check that the sg entries are word-sized (except for last) */
-	for_each_sg(sgt->sgl, sgl, (int)sgt->nents - 1, i) {
-		if (sg_dma_len(sgl) % 4)
-			return -EFAULT;
-	}
-
-	return 0;
-}
-
 static int bcm2835_spi_transfer_one_dma(struct spi_master *master,
 					struct spi_device *spi,
 					struct spi_transfer *tfr,
@@ -305,18 +514,16 @@ static int bcm2835_spi_transfer_one_dma(struct spi_master *master,
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
 	int ret;
 
-	/* check that the scatter gather segments are all a multiple of 4 */
-	if (bcm2835_check_sg_length(&tfr->tx_sg) ||
-	    bcm2835_check_sg_length(&tfr->rx_sg)) {
-		dev_warn_once(&spi->dev,
-			      "scatter gather segment length is not a multiple of 4 - falling back to interrupt mode\n");
-		return bcm2835_spi_transfer_one_irq(master, spi, tfr, cs);
-	}
+	/*
+	 * Transfer first few bytes without DMA if length of first TX or RX
+	 * sglist entry is not a multiple of 4 bytes (hardware limitation).
+	 */
+	bcm2835_spi_transfer_prologue(master, tfr, bs, cs);
 
 	/* setup tx-DMA */
 	ret = bcm2835_spi_prepare_sg(master, tfr, true);
 	if (ret)
-		return ret;
+		goto err_reset_hw;
 
 	/* start TX early */
 	dma_async_issue_pending(master->dma_tx);
@@ -325,7 +532,7 @@ static int bcm2835_spi_transfer_one_dma(struct spi_master *master,
 	bs->dma_pending = 1;
 
 	/* set the DMA length */
-	bcm2835_wr(bs, BCM2835_SPI_DLEN, tfr->len);
+	bcm2835_wr(bs, BCM2835_SPI_DLEN, bs->tx_len);
 
 	/* start the HW */
 	bcm2835_wr(bs, BCM2835_SPI_CS,
@@ -340,8 +547,7 @@ static int bcm2835_spi_transfer_one_dma(struct spi_master *master,
 		/* need to reset on errors */
 		dmaengine_terminate_all(master->dma_tx);
 		bs->dma_pending = false;
-		bcm2835_spi_reset_hw(master);
-		return ret;
+		goto err_reset_hw;
 	}
 
 	/* start rx dma late */
@@ -349,6 +555,11 @@ static int bcm2835_spi_transfer_one_dma(struct spi_master *master,
 
 	/* wait for wakeup in framework */
 	return 1;
+
+err_reset_hw:
+	bcm2835_spi_reset_hw(master);
+	bcm2835_spi_undo_prologue(bs);
+	return ret;
 }
 
 static bool bcm2835_spi_can_dma(struct spi_master *master,
@@ -372,25 +583,6 @@ static bool bcm2835_spi_can_dma(struct spi_master *master,
 		return false;
 	}
 
-	/* if we run rx/tx_buf with word aligned addresses then we are OK */
-	if ((((size_t)tfr->rx_buf & 3) == 0) &&
-	    (((size_t)tfr->tx_buf & 3) == 0))
-		return true;
-
-	/* otherwise we only allow transfers within the same page
-	 * to avoid wasting time on dma_mapping when it is not practical
-	 */
-	if (((size_t)tfr->tx_buf & (PAGE_SIZE - 1)) + tfr->len > PAGE_SIZE) {
-		dev_warn_once(&spi->dev,
-			      "Unaligned spi tx-transfer bridging page\n");
-		return false;
-	}
-	if (((size_t)tfr->rx_buf & (PAGE_SIZE - 1)) + tfr->len > PAGE_SIZE) {
-		dev_warn_once(&spi->dev,
-			      "Unaligned spi rx-transfer bridging page\n");
-		return false;
-	}
-
 	/* return OK */
 	return true;
 }
@@ -614,6 +806,7 @@ static void bcm2835_spi_handle_err(struct spi_master *master,
 	if (cmpxchg(&bs->dma_pending, true, false)) {
 		dmaengine_terminate_all(master->dma_tx);
 		dmaengine_terminate_all(master->dma_rx);
+		bcm2835_spi_undo_prologue(bs);
 	}
 	/* and reset */
 	bcm2835_spi_reset_hw(master);
-- 
2.19.1


_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-11-08  7:06   ` [PATCH 3/7] spi: bcm2835: Fix race on DMA termination Lukas Wunner
@ 2018-11-08  7:06   ` Lukas Wunner
       [not found]     ` <901ff28c305e56d3349d3e044781c095d8e77a3d.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
  2018-11-08  7:06   ` [PATCH 2/7] spi: bcm2835: Fix book-keeping of DMA termination Lukas Wunner
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2018-11-08  7:06 UTC (permalink / raw)
  To: Mark Brown, Eric Anholt, Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

The RX and TX FIFO of the BCM2835 SPI master each accommodate 64 bytes
(16 32-bit dwords).  The CS register provides hints on their fill level:

   "Bit 19  RXR - RX FIFO needs Reading ([¾] full)
    0 = RX FIFO is less than [¾] full (or not active TA = 0).
    1 = RX FIFO is [¾] or more full. Cleared by reading sufficient
        data from the RX FIFO or setting TA to 0."

   "Bit 16  DONE - Transfer Done
    0 = Transfer is in progress (or not active TA = 0).
    1 = Transfer is complete. Cleared by writing more data to the
        TX FIFO or setting TA to 0."

   "If DONE is set [...], write up to 16 [dwords] to SPI_FIFO. [...]
    If RXR is set read 12 [dwords] data from SPI_FIFO."

   [Source: Pages 153, 154 and 158 of
    https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
    Note: The spec is missing the "¾" character, presumably due to
    copy-pasting from a different charset.  It also incorrectly
    refers to 16 and 12 "bytes" instead of 32-bit dwords.]

In short, the RXR bit indicates that 48 bytes can be read and the DONE
bit indicates 64 bytes can be written.  Leverage this knowledge to read
or write bytes blindly to the FIFO, without polling whether data can be
read or free space is available to write.  Moreover, when a transfer is
starting, the TX FIFO is known to be empty, likewise allowing a blind
write of 64 bytes.

This cuts the number of bus accesses in half if the fill level is known.
Also, the (posted) write accesses can be pipelined on the AXI bus since
they are no longer interleaved with (non-posted) reads.

bcm2835_spi_transfer_one_poll() previously switched to interrupt mode
when exceeding a time limit by calling bcm2835_spi_transfer_one_irq().
Because the latter now assumes an empty FIFO, it can no longer be called
directly.  Modify the CS register instead to achieve the same result.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Frank Pavlic <f.pavlic@kunbus.de>
Cc: Martin Sperl <kernel@martin.sperl.org>
Cc: Noralf Trønnes <noralf@tronnes.org>
---
 drivers/spi/spi-bcm2835.c | 66 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 36719d2cc12d..fd9a73963f8c 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -72,6 +72,8 @@
 #define BCM2835_SPI_CS_CS_10		0x00000002
 #define BCM2835_SPI_CS_CS_01		0x00000001
 
+#define BCM2835_SPI_FIFO_SIZE		64
+#define BCM2835_SPI_FIFO_SIZE_3_4	48
 #define BCM2835_SPI_POLLING_LIMIT_US	30
 #define BCM2835_SPI_POLLING_JIFFIES	2
 #define BCM2835_SPI_DMA_MIN_LENGTH	96
@@ -213,6 +215,45 @@ static inline void bcm2835_wait_tx_fifo_empty(struct bcm2835_spi *bs)
 		cpu_relax();
 }
 
+/**
+ * bcm2835_rd_fifo_blind() - blindly read up to @count bytes from RX FIFO
+ * @bs: BCM2835 SPI controller
+ * @count: bytes available for reading in RX FIFO
+ */
+static inline void bcm2835_rd_fifo_blind(struct bcm2835_spi *bs, int count)
+{
+	u8 val;
+
+	count = min(count, bs->rx_len);
+	bs->rx_len -= count;
+
+	while (count) {
+		val = bcm2835_rd(bs, BCM2835_SPI_FIFO);
+		if (bs->rx_buf)
+			*bs->rx_buf++ = val;
+		count--;
+	}
+}
+
+/**
+ * bcm2835_wr_fifo_blind() - blindly write up to @count bytes to TX FIFO
+ * @bs: BCM2835 SPI controller
+ * @count: bytes available for writing in TX FIFO
+ */
+static inline void bcm2835_wr_fifo_blind(struct bcm2835_spi *bs, int count)
+{
+	u8 val;
+
+	count = min(count, bs->tx_len);
+	bs->tx_len -= count;
+
+	while (count) {
+		val = bs->tx_buf ? *bs->tx_buf++ : 0;
+		bcm2835_wr(bs, BCM2835_SPI_FIFO, val);
+		count--;
+	}
+}
+
 static void bcm2835_spi_reset_hw(struct spi_master *master)
 {
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
@@ -236,6 +277,20 @@ static irqreturn_t bcm2835_spi_interrupt(int irq, void *dev_id)
 {
 	struct spi_master *master = dev_id;
 	struct bcm2835_spi *bs = spi_master_get_devdata(master);
+	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
+
+	/*
+	 * An interrupt is signaled either if RXR is set (RX FIFO >= ¾ full)
+	 * or if DONE is set (TX FIFO empty, but RX FIFO may contain residue).
+	 * TX is halted once the RX FIFO is full, so drain the RX FIFO first.
+	 */
+	if (cs & BCM2835_SPI_CS_RXF)
+		bcm2835_rd_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
+	else if (cs & BCM2835_SPI_CS_RXR)
+		bcm2835_rd_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE_3_4);
+
+	if (bs->tx_len && cs & BCM2835_SPI_CS_DONE)
+		bcm2835_wr_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
 
 	/* Read as many bytes as possible from FIFO */
 	bcm2835_rd_fifo(bs);
@@ -266,6 +321,7 @@ static int bcm2835_spi_transfer_one_irq(struct spi_master *master,
 	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
 
 	/* fill TX FIFO as much as possible */
+	bcm2835_wr_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
 	bcm2835_wr_fifo(bs);
 
 	/* enable interrupts */
@@ -672,13 +728,13 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
 	unsigned long timeout;
 
 	/* enable HW block without interrupts */
-	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
+	bcm2835_wr(bs, BCM2835_SPI_CS, cs |= BCM2835_SPI_CS_TA);
 
 	/* fill in the fifo before timeout calculations
 	 * if we are interrupted here, then the data is
 	 * getting transferred by the HW while we are interrupted
 	 */
-	bcm2835_wr_fifo(bs);
+	bcm2835_wr_fifo_blind(bs, BCM2835_SPI_FIFO_SIZE);
 
 	/* set the timeout */
 	timeout = jiffies + BCM2835_SPI_POLLING_JIFFIES;
@@ -700,8 +756,10 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
 					    jiffies - timeout,
 					    bs->tx_len, bs->rx_len);
 			/* fall back to interrupt mode */
-			return bcm2835_spi_transfer_one_irq(master, spi,
-							    tfr, cs);
+			cs |= BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD;
+			bcm2835_wr(bs, BCM2835_SPI_CS, cs);
+			/* tell SPI core to wait for completion */
+			return 1;
 		}
 	}
 
-- 
2.19.1


_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] spi: bcm2835: Overcome sglist entry length limitation
       [not found]     ` <eb5ce210b06fb68580961038412f9499c3e56a76.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
@ 2018-11-09 15:43       ` Stefan Wahren
  0 siblings, 0 replies; 18+ messages in thread
From: Stefan Wahren @ 2018-11-09 15:43 UTC (permalink / raw)
  To: Lukas Wunner, Mark Brown, Eric Anholt
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

Am 08.11.18 um 08:06 schrieb Lukas Wunner:
> When in DMA mode, the BCM2835 SPI controller requires that the FIFO is
> accessed in 4 byte chunks.  This rule is not fulfilled if a transfer
> consists of multiple sglist entries, one per page, and the first entry
> starts in the middle of a page with an offset not a multiple of 4.
>
> The driver currently falls back to programmed I/O for such transfers,
> incurring a significant performance penalty.
>
> Overcome this hardware limitation by transferring the first few bytes of
> a transfer without DMA such that the remainder of the first sglist entry
> becomes a multiple of 4.  Specifics are provided in kerneldoc comments.
>
> An alternative approach would have been to split transfers in the
> ->prepare_message hook, but this may necessitate two transfers per page,
> defeating the goal of clustering multiple pages together in a single
> transfer for efficiency.  E.g. if the first TX sglist entry's length is
> 23 and the first RX's is 40, the first transfer would send and receive
> 23 bytes, the second 40 - 23 = 17 bytes, the third 4096 - 17 = 4079
> bytes, the fourth 4096 - 4079 = 17 bytes and so on.  In other words,
> O(n) transfers are necessary (n = number of sglist entries), whereas
> the algorithm implemented herein only requires O(1) additional work.
>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Mathias Duckeck <m.duckeck@kunbus.de>
> Cc: Frank Pavlic <f.pavlic@kunbus.de>
> Cc: Martin Sperl <kernel@martin.sperl.org>
> Cc: Noralf Trønnes <noralf@tronnes.org>
> ---
>  drivers/spi/spi-bcm2835.c | 291 +++++++++++++++++++++++++++++++-------
>  1 file changed, 242 insertions(+), 49 deletions(-)
>
> diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
> index 9b9b9926a956..36719d2cc12d 100644
> --- a/drivers/spi/spi-bcm2835.c
> +++ b/drivers/spi/spi-bcm2835.c
> @@ -85,20 +85,30 @@
>   * @regs: base address of register map
>   * @clk: core clock, divided to calculate serial clock
>   * @irq: interrupt, signals TX FIFO empty or RX FIFO ¾ full
> + * @tfr: SPI transfer currently processed
>   * @tx_buf: pointer whence next transmitted byte is read
>   * @rx_buf: pointer where next received byte is written
>   * @tx_len: remaining bytes to transmit
>   * @rx_len: remaining bytes to receive
> + * @tx_prologue: bytes transmitted without DMA if first TX sglist entry's
> + *	length is not a multiple of 4 (to overcome hardware limitation)
> + * @rx_prologue: bytes received without DMA if first RX sglist entry's
> + *	length is not a multiple of 4 (to overcome hardware limitation)
> + * @tx_spillover: whether @tx_prologue spills over to second TX sglist entry
>   * @dma_pending: whether a DMA transfer is in progress
>   */
>  struct bcm2835_spi {
>  	void __iomem *regs;
>  	struct clk *clk;
>  	int irq;
> +	struct spi_transfer *tfr;
>  	const u8 *tx_buf;
>  	u8 *rx_buf;
>  	int tx_len;
>  	int rx_len;
> +	int tx_prologue;
> +	int rx_prologue;
> +	bool tx_spillover;
>  	bool dma_pending;
>  };
>  
> @@ -137,6 +147,72 @@ static inline void bcm2835_wr_fifo(struct bcm2835_spi *bs)
>  	}
>  }
>  
> +/**
> + * bcm2835_rd_fifo_count() - blindly read exactly @count bytes from RX FIFO
> + * @bs: BCM2835 SPI controller
> + * @count: bytes to read from RX FIFO
> + *
> + * The caller must ensure that @bs->rx_len is greater than or equal to @count,
> + * that the RX FIFO contains at least @count bytes and that the DMA Enable flag
> + * in the CS register is set (such that a read from the FIFO register receives
> + * 32-bit instead of just 8-bit).
> + */
> +static inline void bcm2835_rd_fifo_count(struct bcm2835_spi *bs, int count)
> +{
> +	u32 val;
> +
> +	bs->rx_len -= count;
> +
> +	while (count > 0) {
> +		val = bcm2835_rd(bs, BCM2835_SPI_FIFO);
> +		if (bs->rx_buf) {
> +			int len = min(count, 4);
> +			memcpy(bs->rx_buf, &val, len);
> +			bs->rx_buf += len;
> +		}
> +		count -= 4;
> +	}
> +}
> +
> +/**
> + * bcm2835_wr_fifo_count() - blindly write exactly @count bytes to TX FIFO
> + * @bs: BCM2835 SPI controller
> + * @count: bytes to write to TX FIFO
> + *
> + * The caller must ensure that @bs->tx_len is greater than or equal to @count,
> + * that the TX FIFO can accommodate @count bytes and that the DMA Enable flag
> + * in the CS register is set (such that a write to the FIFO register transmits
> + * 32-bit instead of just 8-bit).
> + */
> +static inline void bcm2835_wr_fifo_count(struct bcm2835_spi *bs, int count)
> +{
> +	u32 val;
> +
> +	bs->tx_len -= count;
> +
> +	while (count > 0) {
> +		if (bs->tx_buf) {
> +			int len = min(count, 4);
> +			memcpy(&val, bs->tx_buf, len);
> +			bs->tx_buf += len;
> +		} else {
> +			val = 0;
> +		}
> +		bcm2835_wr(bs, BCM2835_SPI_FIFO, val);
> +		count -= 4;
> +	}
> +}
> +
> +/**
> + * bcm2835_wait_tx_fifo_empty() - busy-wait for TX FIFO to empty
> + * @bs: BCM2835 SPI controller
> + */
> +static inline void bcm2835_wait_tx_fifo_empty(struct bcm2835_spi *bs)
> +{
> +	while (!(bcm2835_rd(bs, BCM2835_SPI_CS) & BCM2835_SPI_CS_DONE))
> +		cpu_relax();
> +}
Can we have some kind of timeout here, so we never spin forever in case
hw doesn't behave as expected?

_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] spi: bcm2835: Drop unused code for native Chip Select
       [not found]     ` <a24869503ed4e867b11c66c8615a4d5cddb3b2b5.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
@ 2018-11-10  9:07       ` kernel-TqfNSX0MhmxHKSADF0wUEw
  0 siblings, 0 replies; 18+ messages in thread
From: kernel-TqfNSX0MhmxHKSADF0wUEw @ 2018-11-10  9:07 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Noralf Tronnes, Mathias Duckeck,
	linux-spi-u79uwXL29TY76Z2rM5mHXA, Mark Brown,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic


> On 08.11.2018, at 08:06, Lukas Wunner <lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org> wrote:
> 
> Commit a30a555d7435 ("spi: bcm2835: transform native-cs to gpio-cs on
> first spi_setup") disabled the use of hardware-controlled native Chip
> Select in favour of software-controlled GPIO Chip Select but left code
> to support the former untouched.  Remove it to simplify the driver and
> ease the addition of new features and further optimizations.

I believe downstream is still using native CS instead of GPIO by default.
(Not sure if this really still applies), so they would still be using
this code-fragment.

IIRC the code was als kept for DT-backwards compatibility when the GPIO
function is set to ALT0 - but I may be wrong there...

Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Raspberry Pi spi0 improvements
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-11-08  7:06   ` [PATCH 6/7] spi: bcm2835: Overcome sglist entry length limitation Lukas Wunner
@ 2018-11-10  9:13   ` kernel-TqfNSX0MhmxHKSADF0wUEw
  2018-11-14  5:12   ` Florian Fainelli
  8 siblings, 0 replies; 18+ messages in thread
From: kernel-TqfNSX0MhmxHKSADF0wUEw @ 2018-11-10  9:13 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Noralf Tronnes, Mathias Duckeck,
	linux-spi-u79uwXL29TY76Z2rM5mHXA, Mark Brown,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

Patches: 1-5:
Reviewed-By: Martin Sperl <kernel-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>

> On 08.11.2018, at 08:06, Lukas Wunner <lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org> wrote:
> 
> Here's a first batch of improvements for the spi0 master on the
> Raspberry Pi.  The meat of the series is in its last two patches:
> 
> * Patch [6/7] allows DMA for transfer buffers starting at an offset not a
>  multiple of 4.  This overcomes a limitation affecting Ethernet drivers
>  such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
>  deliberately unaligned receive buffers.
> 
> * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
>  is known to contain data, or the TX FIFO when it is known to have free
>  space.
> 
> The preceding patches fix rarely encountered bugs, remove obsolete code
> and add documentation.
> 
> The series has been tested extensively on the "Revolution Pi" family of
> open source PLCs (https://revolution.kunbus.com/), but further testing
> would be welcome to raise the confidence.
> 
> Thanks,
> 
> Lukas
> 
> 
> Lukas Wunner (7):
>  spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
>  spi: bcm2835: Fix book-keeping of DMA termination
>  spi: bcm2835: Fix race on DMA termination
>  spi: bcm2835: Drop unused code for native Chip Select
>  spi: bcm2835: Document struct bcm2835_spi
>  spi: bcm2835: Overcome sglist entry length limitation
>  spi: bcm2835: Speed up FIFO access if fill level is known
> 
> drivers/spi/spi-bcm2835.c | 478 ++++++++++++++++++++++++++------------
> 1 file changed, 334 insertions(+), 144 deletions(-)
> 
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known
       [not found]     ` <901ff28c305e56d3349d3e044781c095d8e77a3d.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
@ 2018-11-10 10:03       ` kernel-TqfNSX0MhmxHKSADF0wUEw
       [not found]         ` <807EBC97-54BD-49D5-86C8-3768FB4C0105-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: kernel-TqfNSX0MhmxHKSADF0wUEw @ 2018-11-10 10:03 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Noralf Tronnes, Mathias Duckeck,
	linux-spi-u79uwXL29TY76Z2rM5mHXA, Mark Brown,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic


> On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote:
> 
> The RX and TX FIFO of the BCM2835 SPI master each accommodate 64 bytes
> (16 32-bit dwords).  The CS register provides hints on their fill level:
> 
>   "Bit 19  RXR - RX FIFO needs Reading ([¾] full)
>    0 = RX FIFO is less than [¾] full (or not active TA = 0).
>    1 = RX FIFO is [¾] or more full. Cleared by reading sufficient
>        data from the RX FIFO or setting TA to 0."
> 
>   "Bit 16  DONE - Transfer Done
>    0 = Transfer is in progress (or not active TA = 0).
>    1 = Transfer is complete. Cleared by writing more data to the
>        TX FIFO or setting TA to 0."
> 
>   "If DONE is set [...], write up to 16 [dwords] to SPI_FIFO. [...]
>    If RXR is set read 12 [dwords] data from SPI_FIFO."
> 
>   [Source: Pages 153, 154 and 158 of
>    https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
>    Note: The spec is missing the "¾" character, presumably due to
>    copy-pasting from a different charset.  It also incorrectly
>    refers to 16 and 12 "bytes" instead of 32-bit dwords.]
> 
> In short, the RXR bit indicates that 48 bytes can be read and the DONE
> bit indicates 64 bytes can be written.  Leverage this knowledge to read
> or write bytes blindly to the FIFO, without polling whether data can be
> read or free space is available to write.  Moreover, when a transfer is
> starting, the TX FIFO is known to be empty, likewise allowing a blind
> write of 64 bytes.
> 
> This cuts the number of bus accesses in half if the fill level is known.
> Also, the (posted) write accesses can be pipelined on the AXI bus since
> they are no longer interleaved with (non-posted) reads.
> 
> bcm2835_spi_transfer_one_poll() previously switched to interrupt mode
> when exceeding a time limit by calling bcm2835_spi_transfer_one_irq().
> Because the latter now assumes an empty FIFO, it can no longer be called
> directly.  Modify the CS register instead to achieve the same result.
> 
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Mathias Duckeck <m.duckeck@kunbus.de>
> Cc: Frank Pavlic <f.pavlic@kunbus.de>
> Cc: Martin Sperl <kernel@martin.sperl.org>
> Cc: Noralf Trønnes <noralf@tronnes.org>
> ---
> drivers/spi/spi-bcm2835.c | 66 ++++++++++++++++++++++++++++++++++++---
> 1 file changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
> index 36719d2cc12d..fd9a73963f8c 100644
> --- a/drivers/spi/spi-bcm2835.c
> +++ b/drivers/spi/spi-bcm2835.c
> @@ -72,6 +72,8 @@
> #define BCM2835_SPI_CS_CS_10		0x00000002
> #define BCM2835_SPI_CS_CS_01		0x00000001
> 
> +#define BCM2835_SPI_FIFO_SIZE		64
> +#define BCM2835_SPI_FIFO_SIZE_3_4	48
Not sure if this should not be a DT parameter describing the HW block
and not being hardcoded in the driver.

> @@ -672,13 +728,13 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
> 	unsigned long timeout;
> 
> 	/* enable HW block without interrupts */
> -	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
> +	bcm2835_wr(bs, BCM2835_SPI_CS, cs |= BCM2835_SPI_CS_TA);
I believe coding style says: variable assignments should be separated
from function call for readability.
(similar to the code below)
> @@ -700,8 +756,10 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
> 					    jiffies - timeout,
> 					    bs->tx_len, bs->rx_len);
> 			/* fall back to interrupt mode */
> -			return bcm2835_spi_transfer_one_irq(master, spi,
> -							    tfr, cs);
> +			cs |= BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD;
> +			bcm2835_wr(bs, BCM2835_SPI_CS, cs);
> +			/* tell SPI core to wait for completion */
> +			return 1;
> 		}
> 	}
What is wrong about using bcm2835_spi_transfer_one_irq?
It makes the intention clear that we switch to irq mode.

Why are you open-coding it instead?

If there is a slight different behavior now then I would recommend amending
the transfer_one_irq to handle the behavior.

If it is done by adding a “fill bytes” argument to transfer_one_irq that 
may be set to 0 in this case then the compiler should be able to optimize
the unnecessary code away.

Martin
_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known
       [not found]         ` <807EBC97-54BD-49D5-86C8-3768FB4C0105-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
@ 2018-11-10 11:25           ` Stefan Wahren
       [not found]             ` <20181113080740.lrhfo656m7e4kb7a@wunner.de>
       [not found]             ` <52205641.172367.1541849134970-uEpKuDZ350hmhno068Nerg@public.gmane.org>
  0 siblings, 2 replies; 18+ messages in thread
From: Stefan Wahren @ 2018-11-10 11:25 UTC (permalink / raw)
  To: kernel-TqfNSX0MhmxHKSADF0wUEw, Lukas Wunner
  Cc: Mathias Duckeck, Mark Brown, linux-spi-u79uwXL29TY76Z2rM5mHXA,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

> kernel@martin.sperl.org hat am 10. November 2018 um 11:03 geschrieben:
> 
> 
> 
> > On 08.11.2018, at 08:06, Lukas Wunner <lukas@wunner.de> wrote:
> > 
> > The RX and TX FIFO of the BCM2835 SPI master each accommodate 64 bytes
> > (16 32-bit dwords).  The CS register provides hints on their fill level:
> > 
> >   "Bit 19  RXR - RX FIFO needs Reading ([¾] full)
> >    0 = RX FIFO is less than [¾] full (or not active TA = 0).
> >    1 = RX FIFO is [¾] or more full. Cleared by reading sufficient
> >        data from the RX FIFO or setting TA to 0."
> > 
> >   "Bit 16  DONE - Transfer Done
> >    0 = Transfer is in progress (or not active TA = 0).
> >    1 = Transfer is complete. Cleared by writing more data to the
> >        TX FIFO or setting TA to 0."
> > 
> >   "If DONE is set [...], write up to 16 [dwords] to SPI_FIFO. [...]
> >    If RXR is set read 12 [dwords] data from SPI_FIFO."
> > 
> >   [Source: Pages 153, 154 and 158 of
> >    https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
> >    Note: The spec is missing the "¾" character, presumably due to
> >    copy-pasting from a different charset.  It also incorrectly
> >    refers to 16 and 12 "bytes" instead of 32-bit dwords.]
> > 
> > In short, the RXR bit indicates that 48 bytes can be read and the DONE
> > bit indicates 64 bytes can be written.  Leverage this knowledge to read
> > or write bytes blindly to the FIFO, without polling whether data can be
> > read or free space is available to write.  Moreover, when a transfer is
> > starting, the TX FIFO is known to be empty, likewise allowing a blind
> > write of 64 bytes.
> > 
> > This cuts the number of bus accesses in half if the fill level is known.
> > Also, the (posted) write accesses can be pipelined on the AXI bus since
> > they are no longer interleaved with (non-posted) reads.
> > 
> > bcm2835_spi_transfer_one_poll() previously switched to interrupt mode
> > when exceeding a time limit by calling bcm2835_spi_transfer_one_irq().
> > Because the latter now assumes an empty FIFO, it can no longer be called
> > directly.  Modify the CS register instead to achieve the same result.
> > 
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > Cc: Mathias Duckeck <m.duckeck@kunbus.de>
> > Cc: Frank Pavlic <f.pavlic@kunbus.de>
> > Cc: Martin Sperl <kernel@martin.sperl.org>
> > Cc: Noralf Trønnes <noralf@tronnes.org>
> > ---
> > drivers/spi/spi-bcm2835.c | 66 ++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 62 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
> > index 36719d2cc12d..fd9a73963f8c 100644
> > --- a/drivers/spi/spi-bcm2835.c
> > +++ b/drivers/spi/spi-bcm2835.c
> > @@ -72,6 +72,8 @@
> > #define BCM2835_SPI_CS_CS_10		0x00000002
> > #define BCM2835_SPI_CS_CS_01		0x00000001
> > 
> > +#define BCM2835_SPI_FIFO_SIZE		64
> > +#define BCM2835_SPI_FIFO_SIZE_3_4	48
> Not sure if this should not be a DT parameter describing the HW block
> and not being hardcoded in the driver.

I see no reason for this. AFAIK all variants of the BCM2835 have the same FIFO length, so i'm fine with the defines. I only have doubts about the naming FIFO_SIZE_3_4 because it describe a fill level not a size.

Stefan

_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known
       [not found]               ` <20181113080740.lrhfo656m7e4kb7a-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
@ 2018-11-13 19:07                 ` Stefan Wahren
       [not found]                   ` <230536172.259102.1542136037433-uEpKuDZ350hmhno068Nerg@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Stefan Wahren @ 2018-11-13 19:07 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Mathias Duckeck, Mark Brown, linux-spi-u79uwXL29TY76Z2rM5mHXA,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic

> Lukas Wunner <lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org> hat am 13. November 2018 um 09:07 geschrieben:
> 
> 
> On Sat, Nov 10, 2018 at 12:25:34PM +0100, Stefan Wahren wrote:
> > > On 08.11.2018, at 08:06, Lukas Wunner <lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org> wrote:
> > > > +#define BCM2835_SPI_FIFO_SIZE		64
> > > > +#define BCM2835_SPI_FIFO_SIZE_3_4	48
> > 
> > I only have doubts about the naming FIFO_SIZE_3_4 because it describe
> > a fill level not a size.
> 
> Hm, it's three quarters of the FIFO's size, so seems sufficiently apt?

Just a thought because only from the define name i wouldn't think of three quarters first.
Since i don't have a better solution, please go on.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Raspberry Pi spi0 improvements
       [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-11-10  9:13   ` [PATCH 0/7] Raspberry Pi spi0 improvements kernel-TqfNSX0MhmxHKSADF0wUEw
@ 2018-11-14  5:12   ` Florian Fainelli
       [not found]     ` <20181114055121.5xpwxcu6a5qsgjqv@wunner.de>
  8 siblings, 1 reply; 18+ messages in thread
From: Florian Fainelli @ 2018-11-14  5:12 UTC (permalink / raw)
  To: Lukas Wunner, Mark Brown, Eric Anholt, Stefan Wahren
  Cc: Mathias Duckeck, Frank Pavlic, Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-spi-u79uwXL29TY76Z2rM5mHXA



On 11/7/2018 11:06 PM, Lukas Wunner wrote:
> Here's a first batch of improvements for the spi0 master on the
> Raspberry Pi.  The meat of the series is in its last two patches:
> 
> * Patch [6/7] allows DMA for transfer buffers starting at an offset not a
>   multiple of 4.  This overcomes a limitation affecting Ethernet drivers
>   such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
>   deliberately unaligned receive buffers.
> 
> * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
>   is known to contain data, or the TX FIFO when it is known to have free
>   space.
> 
> The preceding patches fix rarely encountered bugs, remove obsolete code
> and add documentation.
> 
> The series has been tested extensively on the "Revolution Pi" family of
> open source PLCs (https://revolution.kunbus.com/), but further testing
> would be welcome to raise the confidence.

Do you have some performance numbers that you could share before/after,
e.g: transfer latencies, number of interrupts and pure throughput?
Thanks for doing this work!
-- 
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known
       [not found]                   ` <230536172.259102.1542136037433-uEpKuDZ350hmhno068Nerg@public.gmane.org>
@ 2018-11-14  5:14                     ` Florian Fainelli
  0 siblings, 0 replies; 18+ messages in thread
From: Florian Fainelli @ 2018-11-14  5:14 UTC (permalink / raw)
  To: Stefan Wahren, Lukas Wunner
  Cc: Mark Brown, linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic



On 11/13/2018 11:07 AM, Stefan Wahren wrote:
>> Lukas Wunner <lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org> hat am 13. November 2018 um 09:07 geschrieben:
>>
>>
>> On Sat, Nov 10, 2018 at 12:25:34PM +0100, Stefan Wahren wrote:
>>>> On 08.11.2018, at 08:06, Lukas Wunner <lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org> wrote:
>>>>> +#define BCM2835_SPI_FIFO_SIZE		64
>>>>> +#define BCM2835_SPI_FIFO_SIZE_3_4	48
>>>
>>> I only have doubts about the naming FIFO_SIZE_3_4 because it describe
>>> a fill level not a size.
>>
>> Hm, it's three quarters of the FIFO's size, so seems sufficiently apt?
> 
> Just a thought because only from the define name i wouldn't think of three quarters first.
> Since i don't have a better solution, please go on.

Does this have to be a constant, or could we just go about defining a
macro which computes any percentage of that value (or quarter, which
ever is most convienent), e.g:

#define BCM2835_SPI_FIFO_SIZE_PCT(pct)	\
	((BCM2835_SPI_FIFO_SIZE * (pct)) / 100)

That might be clearer and more future proof in case you want to
implement a low watermark in the future?
-- 
Florian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/7] Raspberry Pi spi0 improvements
       [not found]       ` <20181114055121.5xpwxcu6a5qsgjqv-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
@ 2018-11-16  5:11         ` Eric Anholt
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Anholt @ 2018-11-16  5:11 UTC (permalink / raw)
  To: Lukas Wunner, Florian Fainelli
  Cc: Noralf Tronnes, linux-spi-u79uwXL29TY76Z2rM5mHXA,
	Mathias Duckeck, Mark Brown,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic


[-- Attachment #1.1: Type: text/plain, Size: 1850 bytes --]

Lukas Wunner <lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org> writes:

> On Tue, Nov 13, 2018 at 09:12:01PM -0800, Florian Fainelli wrote:
>> On 11/7/2018 11:06 PM, Lukas Wunner wrote:
>> > Here's a first batch of improvements for the spi0 master on the
>> > Raspberry Pi.  The meat of the series is in its last two patches:
>> > 
>> > * Patch [6/7] allows DMA for transfer buffers starting at an offset not a
>> >   multiple of 4.  This overcomes a limitation affecting Ethernet drivers
>> >   such as ks8851 which call netdev_alloc_skb_ip_align() to allocate
>> >   deliberately unaligned receive buffers.
>> > 
>> > * Patch [7/7] speeds up PIO transfers by not polling the RX FIFO when it
>> >   is known to contain data, or the TX FIFO when it is known to have free
>> >   space.
>> 
>> Do you have some performance numbers that you could share before/after,
>> e.g: transfer latencies, number of interrupts and pure throughput?
>
> The throughput is primarily determined by the serial clock configured in
> the DT for a specific SPI slave.  There's nothing we can improve there
> in software.
>
> This series is about reducing CPU usage.  E.g. without patch [6/7],
> transfer buffers not aligned to 32-bit are transmitted with programmed I/O
> instead of DMA.  Thus, constantly receiving packets on a ks8851 Ethernet
> chip with a serial clock of 20 MHz occupies 25% of a CPU on a RasPi 3
> (for the ks8851 IRQ thread).  With the patch, it drops to a negligible
> percentage.  The spi-bcm2835.c driver currently forces PIO even for
> kmalloc'ed buffers which are always contiguous in physical memory,
> i.e. for no reason at all.

With the whole series, I got an improvement in kmscube on my hx8357d spi
panel and vc4 rendering from 5.9fps to 6.6fps.  No stats, but the
numbers seem fairly stable between runs.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 206 bytes --]

_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known
       [not found]             ` <52205641.172367.1541849134970-uEpKuDZ350hmhno068Nerg@public.gmane.org>
@ 2018-11-28 15:58               ` Mark Brown
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2018-11-28 15:58 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: linux-spi-u79uwXL29TY76Z2rM5mHXA, Mathias Duckeck,
	Noralf Tronnes,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Frank Pavlic


[-- Attachment #1.1: Type: text/plain, Size: 939 bytes --]

On Sat, Nov 10, 2018 at 12:25:34PM +0100, Stefan Wahren wrote:

> > > +#define BCM2835_SPI_FIFO_SIZE		64
> > > +#define BCM2835_SPI_FIFO_SIZE_3_4	48

> > Not sure if this should not be a DT parameter describing the HW block
> > and not being hardcoded in the driver.

> I see no reason for this. AFAIK all variants of the BCM2835 have the
> same FIFO length, so i'm fine with the defines. I only have doubts
> about the naming FIFO_SIZE_3_4 because it describe a fill level not a
> size.

It's also the sort of thing we should be picking up from the compatible
string since we have a per-SoC compatible rather than parameterizing in
the DT - that way the SoC just needs the right compatible string in the
DT and then new quirks for that SoC don't need DT changes.

Please fix your mail client to word wrap within paragraphs at something
substantially less than 80 columns.  Doing this makes your messages much
easier to read and reply to.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 206 bytes --]

_______________________________________________
linux-rpi-kernel mailing list
linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
http://lists.infradead.org/mailman/listinfo/linux-rpi-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-11-28 15:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-08  7:06 [PATCH 0/7] Raspberry Pi spi0 improvements Lukas Wunner
     [not found] ` <cover.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2018-11-08  7:06   ` [PATCH 4/7] spi: bcm2835: Drop unused code for native Chip Select Lukas Wunner
     [not found]     ` <a24869503ed4e867b11c66c8615a4d5cddb3b2b5.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2018-11-10  9:07       ` kernel-TqfNSX0MhmxHKSADF0wUEw
2018-11-08  7:06   ` [PATCH 1/7] spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode Lukas Wunner
2018-11-08  7:06   ` [PATCH 3/7] spi: bcm2835: Fix race on DMA termination Lukas Wunner
2018-11-08  7:06   ` [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known Lukas Wunner
     [not found]     ` <901ff28c305e56d3349d3e044781c095d8e77a3d.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2018-11-10 10:03       ` kernel-TqfNSX0MhmxHKSADF0wUEw
     [not found]         ` <807EBC97-54BD-49D5-86C8-3768FB4C0105-TqfNSX0MhmxHKSADF0wUEw@public.gmane.org>
2018-11-10 11:25           ` Stefan Wahren
     [not found]             ` <20181113080740.lrhfo656m7e4kb7a@wunner.de>
     [not found]               ` <20181113080740.lrhfo656m7e4kb7a-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2018-11-13 19:07                 ` Stefan Wahren
     [not found]                   ` <230536172.259102.1542136037433-uEpKuDZ350hmhno068Nerg@public.gmane.org>
2018-11-14  5:14                     ` Florian Fainelli
     [not found]             ` <52205641.172367.1541849134970-uEpKuDZ350hmhno068Nerg@public.gmane.org>
2018-11-28 15:58               ` Mark Brown
2018-11-08  7:06   ` [PATCH 2/7] spi: bcm2835: Fix book-keeping of DMA termination Lukas Wunner
2018-11-08  7:06   ` [PATCH 5/7] spi: bcm2835: Document struct bcm2835_spi Lukas Wunner
2018-11-08  7:06   ` [PATCH 6/7] spi: bcm2835: Overcome sglist entry length limitation Lukas Wunner
     [not found]     ` <eb5ce210b06fb68580961038412f9499c3e56a76.1541659680.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2018-11-09 15:43       ` Stefan Wahren
2018-11-10  9:13   ` [PATCH 0/7] Raspberry Pi spi0 improvements kernel-TqfNSX0MhmxHKSADF0wUEw
2018-11-14  5:12   ` Florian Fainelli
     [not found]     ` <20181114055121.5xpwxcu6a5qsgjqv@wunner.de>
     [not found]       ` <20181114055121.5xpwxcu6a5qsgjqv-JFq808J9C/izQB+pC5nmwQ@public.gmane.org>
2018-11-16  5:11         ` Eric Anholt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.