linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Raspberry Pi SPI unbind fixes
@ 2020-05-15 15:58 Lukas Wunner
  2020-05-15 15:58 ` [PATCH 1/5] spi: Fix controller unregister order Lukas Wunner
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Lukas Wunner @ 2020-05-15 15:58 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel, Linus Walleij, Jingoo Han

This series fixes ordering issues occurring on unbind of the
Raspberry Pi SPI drivers:

Turns out devm_spi_register_controller() is prone to
incorrect use and dozens of drivers have gotten it wrong.
I'm only documenting this gotcha here and fixing it in the
Raspberry Pi drivers.  Fixing the rest is for another day.

There's also an ordering issue in the core which has been
present for 8 years and affects all platforms (patch [1/5]).
Doesn't look like unbinding is tested all that often. :-)


Lukas Wunner (5):
  spi: Fix controller unregister order
  spi: bcm2835: Fix controller unregister order
  spi: bcm2835aux: Fix controller unregister order
  spi: bcm2835: Tear down DMA before turning off SPI controller
  spi: Document devm_spi_register_controller() gotcha

 drivers/spi/spi-bcm2835.c    |  8 +++++---
 drivers/spi/spi-bcm2835aux.c |  4 +++-
 drivers/spi/spi.c            | 10 +++++++++-
 3 files changed, 17 insertions(+), 5 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] spi: Fix controller unregister order
  2020-05-15 15:58 [PATCH 0/5] Raspberry Pi SPI unbind fixes Lukas Wunner
@ 2020-05-15 15:58 ` Lukas Wunner
  2020-05-15 16:27   ` Mark Brown
  2020-05-15 15:58 ` [PATCH 2/5] spi: bcm2835: " Lukas Wunner
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2020-05-15 15:58 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel, Linus Walleij

When an SPI controller unregisters, it unbinds all its slave devices.
For this, their drivers may need to access the SPI bus, e.g. to quiesce
interrupts.

However since commit ffbbdd21329f ("spi: create a message queueing
infrastructure"), spi_destroy_queue() is executed before unbinding the
slaves.  It sets ctlr->running = false, thereby preventing SPI bus
access and causing unbinding of slave devices to fail.

Fix by unbinding slaves before calling spi_destroy_queue().

Fixes: ffbbdd21329f ("spi: create a message queueing infrastructure")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v3.4+
Cc: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/spi/spi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index c083ee3995e4..d32bdc6cbf66 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2761,6 +2761,8 @@ void spi_unregister_controller(struct spi_controller *ctlr)
 	struct spi_controller *found;
 	int id = ctlr->bus_num;
 
+	device_for_each_child(&ctlr->dev, NULL, __unregister);
+
 	/* First make sure that this controller was ever added */
 	mutex_lock(&board_lock);
 	found = idr_find(&spi_master_idr, id);
@@ -2773,7 +2775,6 @@ void spi_unregister_controller(struct spi_controller *ctlr)
 	list_del(&ctlr->list);
 	mutex_unlock(&board_lock);
 
-	device_for_each_child(&ctlr->dev, NULL, __unregister);
 	device_unregister(&ctlr->dev);
 	/* free bus id */
 	mutex_lock(&board_lock);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] spi: bcm2835: Fix controller unregister order
  2020-05-15 15:58 [PATCH 0/5] Raspberry Pi SPI unbind fixes Lukas Wunner
  2020-05-15 15:58 ` [PATCH 1/5] spi: Fix controller unregister order Lukas Wunner
@ 2020-05-15 15:58 ` Lukas Wunner
  2020-05-15 16:29   ` Mark Brown
  2020-05-15 15:58 ` [PATCH 3/5] spi: bcm2835aux: " Lukas Wunner
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2020-05-15 15:58 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi, linux-rpi-kernel

The BCM2835 SPI driver uses devm_spi_register_controller() on bind.
As a consequence, on unbind, __device_release_driver() first invokes
bcm2835_spi_remove() before unregistering the SPI controller via
devres_release_all().

This order is incorrect:  bcm2835_spi_remove() tears down the DMA
channels and turns off the SPI controller, including its interrupts
and clock.  The SPI controller is thus no longer usable.

When the SPI controller is subsequently unregistered, it unbinds all
its slave devices.  If their drivers need to access the SPI bus,
e.g. to quiesce their interrupts, unbinding will fail.

As a rule, devm_spi_register_controller() must not be used if the
->remove() hook performs teardown steps which shall be performed
after unbinding of slaves.

Fix by using the non-devm variant spi_register_controller().  Note that
the struct spi_controller as well as the driver-private data are not
freed until after bcm2835_spi_remove() has finished, so accessing them
is safe.

Fixes: 247263dba208 ("spi: bcm2835: use devm_spi_register_master()")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v3.13+
---
 drivers/spi/spi-bcm2835.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index cad3458e23ed..06d2782d38ec 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -1351,7 +1351,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 		goto out_dma_release;
 	}
 
-	err = devm_spi_register_controller(&pdev->dev, ctlr);
+	err = spi_register_controller(ctlr);
 	if (err) {
 		dev_err(&pdev->dev, "could not register SPI controller: %d\n",
 			err);
@@ -1378,6 +1378,8 @@ static int bcm2835_spi_remove(struct platform_device *pdev)
 
 	bcm2835_debugfs_remove(bs);
 
+	spi_unregister_controller(ctlr);
+
 	/* Clear FIFOs, and disable the HW block */
 	bcm2835_wr(bs, BCM2835_SPI_CS,
 		   BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] spi: bcm2835aux: Fix controller unregister order
  2020-05-15 15:58 [PATCH 0/5] Raspberry Pi SPI unbind fixes Lukas Wunner
  2020-05-15 15:58 ` [PATCH 1/5] spi: Fix controller unregister order Lukas Wunner
  2020-05-15 15:58 ` [PATCH 2/5] spi: bcm2835: " Lukas Wunner
@ 2020-05-15 15:58 ` Lukas Wunner
  2020-05-15 15:58 ` [PATCH 4/5] spi: bcm2835: Tear down DMA before turning off SPI controller Lukas Wunner
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Lukas Wunner @ 2020-05-15 15:58 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi, linux-rpi-kernel

The BCM2835aux SPI driver uses devm_spi_register_master() on bind.
As a consequence, on unbind, __device_release_driver() first invokes
bcm2835aux_spi_remove() before unregistering the SPI controller via
devres_release_all().

This order is incorrect:  bcm2835aux_spi_remove() turns off the SPI
controller, including its interrupts and clock.  The SPI controller
is thus no longer usable.

When the SPI controller is subsequently unregistered, it unbinds all
its slave devices.  If their drivers need to access the SPI bus,
e.g. to quiesce their interrupts, unbinding will fail.

As a rule, devm_spi_register_master() must not be used if the
->remove() hook performs teardown steps which shall be performed
after unbinding of slaves.

Fix by using the non-devm variant spi_register_master().  Note that the
struct spi_master as well as the driver-private data are not freed until
after bcm2835aux_spi_remove() has finished, so accessing them is safe.

Fixes: 1ea29b39f4c8 ("spi: bcm2835aux: add bcm2835 auxiliary spi device driver")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v4.4+
Cc: Martin Sperl <kernel@martin.sperl.org>
---
 drivers/spi/spi-bcm2835aux.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-bcm2835aux.c b/drivers/spi/spi-bcm2835aux.c
index a2162ff56a12..c331efd6e86b 100644
--- a/drivers/spi/spi-bcm2835aux.c
+++ b/drivers/spi/spi-bcm2835aux.c
@@ -569,7 +569,7 @@ static int bcm2835aux_spi_probe(struct platform_device *pdev)
 		goto out_clk_disable;
 	}
 
-	err = devm_spi_register_master(&pdev->dev, master);
+	err = spi_register_master(master);
 	if (err) {
 		dev_err(&pdev->dev, "could not register SPI master: %d\n", err);
 		goto out_clk_disable;
@@ -593,6 +593,8 @@ static int bcm2835aux_spi_remove(struct platform_device *pdev)
 
 	bcm2835aux_debugfs_remove(bs);
 
+	spi_unregister_master(master);
+
 	bcm2835aux_spi_reset_hw(bs);
 
 	/* disable the HW block by releasing the clock */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] spi: bcm2835: Tear down DMA before turning off SPI controller
  2020-05-15 15:58 [PATCH 0/5] Raspberry Pi SPI unbind fixes Lukas Wunner
                   ` (2 preceding siblings ...)
  2020-05-15 15:58 ` [PATCH 3/5] spi: bcm2835aux: " Lukas Wunner
@ 2020-05-15 15:58 ` Lukas Wunner
  2020-05-15 15:58 ` [PATCH 5/5] spi: Document devm_spi_register_controller() gotcha Lukas Wunner
  2020-05-20 17:17 ` [PATCH 0/5] Raspberry Pi SPI unbind fixes Mark Brown
  5 siblings, 0 replies; 18+ messages in thread
From: Lukas Wunner @ 2020-05-15 15:58 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi, linux-rpi-kernel

On unbind of the BCM2835 SPI driver, the SPI controller is disabled
first and the DMA channels are terminated and torn down afterwards.

This seems backwards:  In the theoretical case that DMA is active,
it might try to fill the SPI FIFOs even after the controller has
been disabled.

Reverse the order, thereby mirroring what's done on ->probe().

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/spi/spi-bcm2835.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 06d2782d38ec..20d8581fdf88 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -1380,14 +1380,14 @@ static int bcm2835_spi_remove(struct platform_device *pdev)
 
 	spi_unregister_controller(ctlr);
 
+	bcm2835_dma_release(ctlr, bs);
+
 	/* Clear FIFOs, and disable the HW block */
 	bcm2835_wr(bs, BCM2835_SPI_CS,
 		   BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);
 
 	clk_disable_unprepare(bs->clk);
 
-	bcm2835_dma_release(ctlr, bs);
-
 	return 0;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] spi: Document devm_spi_register_controller() gotcha
  2020-05-15 15:58 [PATCH 0/5] Raspberry Pi SPI unbind fixes Lukas Wunner
                   ` (3 preceding siblings ...)
  2020-05-15 15:58 ` [PATCH 4/5] spi: bcm2835: Tear down DMA before turning off SPI controller Lukas Wunner
@ 2020-05-15 15:58 ` Lukas Wunner
  2020-05-15 16:30   ` Mark Brown
  2020-05-20 17:17 ` [PATCH 0/5] Raspberry Pi SPI unbind fixes Mark Brown
  5 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2020-05-15 15:58 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi, linux-rpi-kernel

As a rule, devm_spi_register_controller() must not be used if the
driver's ->remove() hook performs teardown steps which shall be
performed after unbinding of slaves.

Dozens of drivers are doing it wrong.  Document this gotcha to
hopefully prevent further misuse.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/spi/spi.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index d32bdc6cbf66..e1a35aa7eeb8 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2714,6 +2714,13 @@ static void devm_spi_unregister(struct device *dev, void *res)
  * Register a SPI device as with spi_register_controller() which will
  * automatically be unregistered and freed.
  *
+ * Be aware that a managed SPI controller and the attached slaves are
+ * unregistered after the driver's ->remove() callback has been executed.
+ * So the SPI slaves may still access the bus during and after ->remove().
+ * Thus, devm_spi_register_controller() may only be used if there is no
+ * ->remove() callback at all or if it does not perform teardown steps
+ * which render the bus inaccessible.
+ *
  * Return: zero on success, else a negative error code.
  */
 int devm_spi_register_controller(struct device *dev,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] spi: Fix controller unregister order
  2020-05-15 15:58 ` [PATCH 1/5] spi: Fix controller unregister order Lukas Wunner
@ 2020-05-15 16:27   ` Mark Brown
  2020-05-15 16:31     ` Lukas Wunner
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Brown @ 2020-05-15 16:27 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel, Linus Walleij

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On Fri, May 15, 2020 at 05:58:01PM +0200, Lukas Wunner wrote:

> However since commit ffbbdd21329f ("spi: create a message queueing
> infrastructure"), spi_destroy_queue() is executed before unbinding the
> slaves.  It sets ctlr->running = false, thereby preventing SPI bus
> access and causing unbinding of slave devices to fail.

Devices should basically never fail an unbind operation - if something
went seriously wrong there's basically nothing that can be done in terms
of error handling and keeping the device around isn't going to help.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] spi: bcm2835: Fix controller unregister order
  2020-05-15 15:58 ` [PATCH 2/5] spi: bcm2835: " Lukas Wunner
@ 2020-05-15 16:29   ` Mark Brown
  2020-05-15 16:44     ` Lukas Wunner
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Brown @ 2020-05-15 16:29 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi, linux-rpi-kernel

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On Fri, May 15, 2020 at 05:58:02PM +0200, Lukas Wunner wrote:

> Fix by using the non-devm variant spi_register_controller().  Note that
> the struct spi_controller as well as the driver-private data are not
> freed until after bcm2835_spi_remove() has finished, so accessing them
> is safe.

Why not use managed allocations of clocks and DMA channels?  This is a
standard issue with the devm APIs, if you're using them you basically
need to use them for *everything* up to the point where you start using
them.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] spi: Document devm_spi_register_controller() gotcha
  2020-05-15 15:58 ` [PATCH 5/5] spi: Document devm_spi_register_controller() gotcha Lukas Wunner
@ 2020-05-15 16:30   ` Mark Brown
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2020-05-15 16:30 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi, linux-rpi-kernel

[-- Attachment #1: Type: text/plain, Size: 460 bytes --]

On Fri, May 15, 2020 at 05:58:05PM +0200, Lukas Wunner wrote:
> As a rule, devm_spi_register_controller() must not be used if the
> driver's ->remove() hook performs teardown steps which shall be
> performed after unbinding of slaves.
> 
> Dozens of drivers are doing it wrong.  Document this gotcha to
> hopefully prevent further misuse.

This is something that needs to be documented at the devm level, it
applies to pretty much every managed API.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] spi: Fix controller unregister order
  2020-05-15 16:27   ` Mark Brown
@ 2020-05-15 16:31     ` Lukas Wunner
  2020-05-15 21:37       ` Andy Shevchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2020-05-15 16:31 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel, Linus Walleij

On Fri, May 15, 2020 at 05:27:25PM +0100, Mark Brown wrote:
> On Fri, May 15, 2020 at 05:58:01PM +0200, Lukas Wunner wrote:
> > However since commit ffbbdd21329f ("spi: create a message queueing
> > infrastructure"), spi_destroy_queue() is executed before unbinding the
> > slaves.  It sets ctlr->running = false, thereby preventing SPI bus
> > access and causing unbinding of slave devices to fail.
> 
> Devices should basically never fail an unbind operation - if something
> went seriously wrong there's basically nothing that can be done in terms
> of error handling and keeping the device around isn't going to help.

I guess the word "fail" in the commit message invites misinterpretations.
The driver does unbind from the slave device, but the physical device is
not left in a proper state.  E.g. interrupts may still be generated by
the device because writing a register to disable them failed.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] spi: bcm2835: Fix controller unregister order
  2020-05-15 16:29   ` Mark Brown
@ 2020-05-15 16:44     ` Lukas Wunner
  2020-05-15 21:33       ` Andy Shevchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2020-05-15 16:44 UTC (permalink / raw)
  To: Mark Brown
  Cc: Nicolas Saenz Julienne, Martin Sperl, linux-spi, linux-rpi-kernel

On Fri, May 15, 2020 at 05:29:03PM +0100, Mark Brown wrote:
> On Fri, May 15, 2020 at 05:58:02PM +0200, Lukas Wunner wrote:
> > Fix by using the non-devm variant spi_register_controller().  Note that
> > the struct spi_controller as well as the driver-private data are not
> > freed until after bcm2835_spi_remove() has finished, so accessing them
> > is safe.
> 
> Why not use managed allocations of clocks and DMA channels?  This is a
> standard issue with the devm APIs, if you're using them you basically
> need to use them for *everything* up to the point where you start using
> them.

There is no devm version of clk_prepare_enable(), dma_request_chan()
and various other functions invoked on ->probe() by spi-bcm2835.c.
So tearing down DMA channels, disabling clocks etc needs to happen
in the ->remove() hook and consequently devm_spi_register_controller()
cannot be used.

So I respectfully submit that the patch is fine.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] spi: bcm2835: Fix controller unregister order
  2020-05-15 16:44     ` Lukas Wunner
@ 2020-05-15 21:33       ` Andy Shevchenko
  2020-05-16  6:56         ` Lukas Wunner
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Shevchenko @ 2020-05-15 21:33 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Mark Brown, Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel

On Fri, May 15, 2020 at 7:45 PM Lukas Wunner <lukas@wunner.de> wrote:
> On Fri, May 15, 2020 at 05:29:03PM +0100, Mark Brown wrote:
> > On Fri, May 15, 2020 at 05:58:02PM +0200, Lukas Wunner wrote:
> > > Fix by using the non-devm variant spi_register_controller().  Note that
> > > the struct spi_controller as well as the driver-private data are not
> > > freed until after bcm2835_spi_remove() has finished, so accessing them
> > > is safe.
> >
> > Why not use managed allocations of clocks and DMA channels?  This is a
> > standard issue with the devm APIs, if you're using them you basically
> > need to use them for *everything* up to the point where you start using
> > them.
>
> There is no devm version of clk_prepare_enable(), dma_request_chan()
> and various other functions invoked on ->probe() by spi-bcm2835.c.
> So tearing down DMA channels, disabling clocks etc needs to happen
> in the ->remove() hook and consequently devm_spi_register_controller()
> cannot be used.

There is devm_add_action_or_reset (IIRC the name). It does a trick.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] spi: Fix controller unregister order
  2020-05-15 16:31     ` Lukas Wunner
@ 2020-05-15 21:37       ` Andy Shevchenko
  2020-05-16  6:45         ` Lukas Wunner
  0 siblings, 1 reply; 18+ messages in thread
From: Andy Shevchenko @ 2020-05-15 21:37 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Mark Brown, Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel, Linus Walleij

On Fri, May 15, 2020 at 7:41 PM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Fri, May 15, 2020 at 05:27:25PM +0100, Mark Brown wrote:
> > On Fri, May 15, 2020 at 05:58:01PM +0200, Lukas Wunner wrote:
> > > However since commit ffbbdd21329f ("spi: create a message queueing
> > > infrastructure"), spi_destroy_queue() is executed before unbinding the
> > > slaves.  It sets ctlr->running = false, thereby preventing SPI bus
> > > access and causing unbinding of slave devices to fail.
> >
> > Devices should basically never fail an unbind operation - if something
> > went seriously wrong there's basically nothing that can be done in terms
> > of error handling and keeping the device around isn't going to help.
>
> I guess the word "fail" in the commit message invites misinterpretations.
> The driver does unbind from the slave device, but the physical device is
> not left in a proper state.  E.g. interrupts may still be generated by
> the device because writing a register to disable them failed.

I didn't check a patch, but I see a bug on kernel bugzilla against
spi-pxa2xx because of this. It requires quite untrivial ->remove() in
order to quiescent the DMA and other activities.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] spi: Fix controller unregister order
  2020-05-15 21:37       ` Andy Shevchenko
@ 2020-05-16  6:45         ` Lukas Wunner
  2020-05-16  9:45           ` Andy Shevchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2020-05-16  6:45 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Mark Brown, Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel, Linus Walleij

On Sat, May 16, 2020 at 12:37:17AM +0300, Andy Shevchenko wrote:
> On Fri, May 15, 2020 at 7:41 PM Lukas Wunner <lukas@wunner.de> wrote:
> > On Fri, May 15, 2020 at 05:27:25PM +0100, Mark Brown wrote:
> > > On Fri, May 15, 2020 at 05:58:01PM +0200, Lukas Wunner wrote:
> > > > However since commit ffbbdd21329f ("spi: create a message queueing
> > > > infrastructure"), spi_destroy_queue() is executed before unbinding the
> > > > slaves.  It sets ctlr->running = false, thereby preventing SPI bus
> > > > access and causing unbinding of slave devices to fail.
> > >
> > > Devices should basically never fail an unbind operation - if something
> > > went seriously wrong there's basically nothing that can be done in terms
> > > of error handling and keeping the device around isn't going to help.
> >
> > I guess the word "fail" in the commit message invites misinterpretations.
> > The driver does unbind from the slave device, but the physical device is
> > not left in a proper state.  E.g. interrupts may still be generated by
> > the device because writing a register to disable them failed.
> 
> I didn't check a patch, but I see a bug on kernel bugzilla against
> spi-pxa2xx because of this. It requires quite untrivial ->remove() in
> order to quiescent the DMA and other activities.

Yes from a quick look at spi-pxa2xx.c it's immediately obvious that
the use of devm_spi_register_controller() is likewise completely wrong.

The crucial thing to understand is that the SPI driver's ->remove()
hook is executed *before* any device-managed resources are released.
pxa2xx_spi_remove() disables the clock, frees the IRQ, releases DMA,
so the SPI controller is no longer usable even though it's still
registered!  Somehow this incorrect order got cargo-culted to dozens
of drivers over the years.

We use SPI-attached Ethernet chips and when the SPI driver's module
is unloaded, the Ethernet driver's ->ndo_stop() hook is executed to
bring down the interface.  For this it needs to communicate with the
Ethernet chip, but it can't because the ->remove() hook has already been
executed and unbinding of the SPI slave happens afterwards, when the
SPI controller is unregistered via devres_release_all().

There's another issue in spi-pxa2xx.c:  It acquires a runtime PM ref
even though the driver core already does that.

Do you have a link to the spi-pxa2xx.c bugzilla?  Are you able to
test patches?  I can submit a patch but I can only compile-test it,
I don't have that hardware.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] spi: bcm2835: Fix controller unregister order
  2020-05-15 21:33       ` Andy Shevchenko
@ 2020-05-16  6:56         ` Lukas Wunner
  2020-05-16  9:40           ` Andy Shevchenko
  0 siblings, 1 reply; 18+ messages in thread
From: Lukas Wunner @ 2020-05-16  6:56 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Mark Brown, Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel

On Sat, May 16, 2020 at 12:33:00AM +0300, Andy Shevchenko wrote:
> On Fri, May 15, 2020 at 7:45 PM Lukas Wunner <lukas@wunner.de> wrote:
> > On Fri, May 15, 2020 at 05:29:03PM +0100, Mark Brown wrote:
> > > On Fri, May 15, 2020 at 05:58:02PM +0200, Lukas Wunner wrote:
> > > > Fix by using the non-devm variant spi_register_controller().  Note that
> > > > the struct spi_controller as well as the driver-private data are not
> > > > freed until after bcm2835_spi_remove() has finished, so accessing them
> > > > is safe.
> > >
> > > Why not use managed allocations of clocks and DMA channels?  This is a
> > > standard issue with the devm APIs, if you're using them you basically
> > > need to use them for *everything* up to the point where you start using
> > > them.
> >
> > There is no devm version of clk_prepare_enable(), dma_request_chan()
> > and various other functions invoked on ->probe() by spi-bcm2835.c.
> > So tearing down DMA channels, disabling clocks etc needs to happen
> > in the ->remove() hook and consequently devm_spi_register_controller()
> > cannot be used.
> 
> There is devm_add_action_or_reset (IIRC the name). It does a trick.

Interesting, thanks.

However there are currently four actions performed by bcm2835_spi_remove():

	bcm2835_debugfs_remove(bs);

	/* Clear FIFOs, and disable the HW block */
	bcm2835_wr(bs, BCM2835_SPI_CS,
		   BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX);

	clk_disable_unprepare(bs->clk);

	bcm2835_dma_release(ctlr, bs);

So I think I'd have to add four functions to perform these devm actions,
which would add a lot more code than just the single line added by my
patch.  It also seems doubtful that the teardown code will still be easy
to follow.  And small patches like the ones I've submitted lend themselves
better to backporting to stable.

Mark, please provide guidance as to which variant you'd prefer:
Switching to the non-devm variant of spi_register_controller() as I've
done here or adding devm calls for all the existing teardown steps.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] spi: bcm2835: Fix controller unregister order
  2020-05-16  6:56         ` Lukas Wunner
@ 2020-05-16  9:40           ` Andy Shevchenko
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Shevchenko @ 2020-05-16  9:40 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Mark Brown, Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel

On Sat, May 16, 2020 at 9:56 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Sat, May 16, 2020 at 12:33:00AM +0300, Andy Shevchenko wrote:
> > On Fri, May 15, 2020 at 7:45 PM Lukas Wunner <lukas@wunner.de> wrote:
> > > On Fri, May 15, 2020 at 05:29:03PM +0100, Mark Brown wrote:
> > > > On Fri, May 15, 2020 at 05:58:02PM +0200, Lukas Wunner wrote:
> > > > > Fix by using the non-devm variant spi_register_controller().  Note that
> > > > > the struct spi_controller as well as the driver-private data are not
> > > > > freed until after bcm2835_spi_remove() has finished, so accessing them
> > > > > is safe.
> > > >
> > > > Why not use managed allocations of clocks and DMA channels?  This is a
> > > > standard issue with the devm APIs, if you're using them you basically
> > > > need to use them for *everything* up to the point where you start using
> > > > them.
> > >
> > > There is no devm version of clk_prepare_enable(), dma_request_chan()
> > > and various other functions invoked on ->probe() by spi-bcm2835.c.
> > > So tearing down DMA channels, disabling clocks etc needs to happen
> > > in the ->remove() hook and consequently devm_spi_register_controller()
> > > cannot be used.
> >
> > There is devm_add_action_or_reset (IIRC the name). It does a trick.
>
> Interesting, thanks.
>
> However there are currently four actions performed by bcm2835_spi_remove():
>

>         bcm2835_debugfs_remove(bs);

This one shouldn't be counted. You can init it as the last op in
->probe() and that mustn't fail the probe.

>         /* Clear FIFOs, and disable the HW block */
>         bcm2835_wr(bs, BCM2835_SPI_CS,
>                    BCM2835_SPI_CS_CLEAR_RX | BCM2835_SPI_CS_CLEAR_TX)

This one I don't know.

>         clk_disable_unprepare(bs->clk);

This one traditionally done via devm_add_action_or_reset.

>         bcm2835_dma_release(ctlr, bs);

This one probably the one which needs to be addressed ideally in DMA
engine for all.

>
> So I think I'd have to add four functions to perform these devm actions,
> which would add a lot more code than just the single line added by my
> patch.  It also seems doubtful that the teardown code will still be easy
> to follow.  And small patches like the ones I've submitted lend themselves
> better to backporting to stable.
>
> Mark, please provide guidance as to which variant you'd prefer:
> Switching to the non-devm variant of spi_register_controller() as I've
> done here or adding devm calls for all the existing teardown steps.

All generic devm_* APIs have a release counterparts, you can
explicitly call them in order you want during ->remove(). So, I still
see a benefit of devm_*() even in such cases (makes ->probe() much
easier).

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] spi: Fix controller unregister order
  2020-05-16  6:45         ` Lukas Wunner
@ 2020-05-16  9:45           ` Andy Shevchenko
  0 siblings, 0 replies; 18+ messages in thread
From: Andy Shevchenko @ 2020-05-16  9:45 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Mark Brown, Nicolas Saenz Julienne, Martin Sperl, linux-spi,
	linux-rpi-kernel, Linus Walleij

On Sat, May 16, 2020 at 9:45 AM Lukas Wunner <lukas@wunner.de> wrote:
>
> On Sat, May 16, 2020 at 12:37:17AM +0300, Andy Shevchenko wrote:
> > On Fri, May 15, 2020 at 7:41 PM Lukas Wunner <lukas@wunner.de> wrote:
> > > On Fri, May 15, 2020 at 05:27:25PM +0100, Mark Brown wrote:
> > > > On Fri, May 15, 2020 at 05:58:01PM +0200, Lukas Wunner wrote:
> > > > > However since commit ffbbdd21329f ("spi: create a message queueing
> > > > > infrastructure"), spi_destroy_queue() is executed before unbinding the
> > > > > slaves.  It sets ctlr->running = false, thereby preventing SPI bus
> > > > > access and causing unbinding of slave devices to fail.
> > > >
> > > > Devices should basically never fail an unbind operation - if something
> > > > went seriously wrong there's basically nothing that can be done in terms
> > > > of error handling and keeping the device around isn't going to help.
> > >
> > > I guess the word "fail" in the commit message invites misinterpretations.
> > > The driver does unbind from the slave device, but the physical device is
> > > not left in a proper state.  E.g. interrupts may still be generated by
> > > the device because writing a register to disable them failed.
> >
> > I didn't check a patch, but I see a bug on kernel bugzilla against
> > spi-pxa2xx because of this. It requires quite untrivial ->remove() in
> > order to quiescent the DMA and other activities.
>
> Yes from a quick look at spi-pxa2xx.c it's immediately obvious that
> the use of devm_spi_register_controller() is likewise completely wrong.
>
> The crucial thing to understand is that the SPI driver's ->remove()
> hook is executed *before* any device-managed resources are released.
> pxa2xx_spi_remove() disables the clock, frees the IRQ, releases DMA,
> so the SPI controller is no longer usable even though it's still
> registered!  Somehow this incorrect order got cargo-culted to dozens
> of drivers over the years.
>
> We use SPI-attached Ethernet chips and when the SPI driver's module
> is unloaded, the Ethernet driver's ->ndo_stop() hook is executed to
> bring down the interface.  For this it needs to communicate with the
> Ethernet chip, but it can't because the ->remove() hook has already been
> executed and unbinding of the SPI slave happens afterwards, when the
> SPI controller is unregistered via devres_release_all().
>
> There's another issue in spi-pxa2xx.c:  It acquires a runtime PM ref
> even though the driver core already does that.
>
> Do you have a link to the spi-pxa2xx.c bugzilla?  Are you able to
> test patches?  I can submit a patch but I can only compile-test it,

Here you are: https://bugzilla.kernel.org/show_bug.cgi?id=206403.
There also a link to my GH tree where I tried to clean up a bit. And
yes, I know about atomic handling bug there, but it's another story.

I was able to reproduce the bug once or twice, but submitter has a
test case with reproducibility close to 100%.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/5] Raspberry Pi SPI unbind fixes
  2020-05-15 15:58 [PATCH 0/5] Raspberry Pi SPI unbind fixes Lukas Wunner
                   ` (4 preceding siblings ...)
  2020-05-15 15:58 ` [PATCH 5/5] spi: Document devm_spi_register_controller() gotcha Lukas Wunner
@ 2020-05-20 17:17 ` Mark Brown
  5 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2020-05-20 17:17 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Martin Sperl, Linus Walleij, linux-spi, Jingoo Han,
	linux-rpi-kernel, Nicolas Saenz Julienne

On Fri, 15 May 2020 17:58:00 +0200, Lukas Wunner wrote:
> This series fixes ordering issues occurring on unbind of the
> Raspberry Pi SPI drivers:
> 
> Turns out devm_spi_register_controller() is prone to
> incorrect use and dozens of drivers have gotten it wrong.
> I'm only documenting this gotcha here and fixing it in the
> Raspberry Pi drivers.  Fixing the rest is for another day.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/4] spi: Fix controller unregister order
      commit: 84855678add8aba927faf76bc2f130a40f94b6f7
[2/4] spi: bcm2835: Fix controller unregister order
      commit: 9dd277ff92d06f6aa95b39936ad83981d781f49b
[3/4] spi: bcm2835aux: Fix controller unregister order
      commit: b9dd3f6d417258ad0beeb292a1bc74200149f15d
[4/4] spi: bcm2835: Tear down DMA before turning off SPI controller
      (no commit info)

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-05-20 17:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15 15:58 [PATCH 0/5] Raspberry Pi SPI unbind fixes Lukas Wunner
2020-05-15 15:58 ` [PATCH 1/5] spi: Fix controller unregister order Lukas Wunner
2020-05-15 16:27   ` Mark Brown
2020-05-15 16:31     ` Lukas Wunner
2020-05-15 21:37       ` Andy Shevchenko
2020-05-16  6:45         ` Lukas Wunner
2020-05-16  9:45           ` Andy Shevchenko
2020-05-15 15:58 ` [PATCH 2/5] spi: bcm2835: " Lukas Wunner
2020-05-15 16:29   ` Mark Brown
2020-05-15 16:44     ` Lukas Wunner
2020-05-15 21:33       ` Andy Shevchenko
2020-05-16  6:56         ` Lukas Wunner
2020-05-16  9:40           ` Andy Shevchenko
2020-05-15 15:58 ` [PATCH 3/5] spi: bcm2835aux: " Lukas Wunner
2020-05-15 15:58 ` [PATCH 4/5] spi: bcm2835: Tear down DMA before turning off SPI controller Lukas Wunner
2020-05-15 15:58 ` [PATCH 5/5] spi: Document devm_spi_register_controller() gotcha Lukas Wunner
2020-05-15 16:30   ` Mark Brown
2020-05-20 17:17 ` [PATCH 0/5] Raspberry Pi SPI unbind fixes Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).