linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] spi: spi-bcm2835: Fix deadlock
@ 2021-07-16 21:02 alexandru.tachici
  2021-07-16 21:02 ` [PATCH 1/1] " alexandru.tachici
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: alexandru.tachici @ 2021-07-16 21:02 UTC (permalink / raw)
  To: linux-kernel, linux-spi, broonie
  Cc: nuno.sa, bootc, swarren, bcm-kernel-feedback-list, rjui,
	f.fainelli, nsaenz, Alexandru Tachici

From: Alexandru Tachici <alexandru.tachici@analog.com>

The bcm2835_spi_transfer_one function can create a deadlock
if it is called while another thread already has the
CCF lock.

This behavior was observed at boot and when trying to
print the clk_summary debugfs. I had registered
at the time multiple clocks of AD9545 through the CCF.
Tested this using an RPi 4 connected to AD9545 through SPI.

See upstream attempt here:
https://lore.kernel.org/lkml/20210614070718.78041-3-alexandru.tachici@analog.com/T/

This can happen to any other clock that needs to read
the rate/phase from hardware using the SPI. Because
when issuing a clk_get_rate/phase, the requesting thread
already holds the CCF lock. If another thread, in this case
the one that does the spi transfer tries the same, it will cause
a deadlock. This happens by chance because not always
every spi request gets deferred to a khthread.

Alexandru Tachici (1):
  spi: spi-bcm2835: Fix deadlock

 drivers/spi/spi-bcm2835.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

--
2.25.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] spi: spi-bcm2835: Fix deadlock
  2021-07-16 21:02 [PATCH 0/1] spi: spi-bcm2835: Fix deadlock alexandru.tachici
@ 2021-07-16 21:02 ` alexandru.tachici
  2021-07-19 23:50   ` Florian Fainelli
  2021-07-20 12:33 ` [PATCH 0/1] " Mark Brown
  2021-07-20 18:48 ` Mark Brown
  2 siblings, 1 reply; 7+ messages in thread
From: alexandru.tachici @ 2021-07-16 21:02 UTC (permalink / raw)
  To: linux-kernel, linux-spi, broonie
  Cc: nuno.sa, bootc, swarren, bcm-kernel-feedback-list, rjui,
	f.fainelli, nsaenz, Alexandru Tachici

From: Alexandru Tachici <alexandru.tachici@analog.com>

The bcm2835_spi_transfer_one function can create a deadlock
if it is called while another thread already has the
CCF lock.

Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Fixes: f8043872e796 ("spi: add driver for BCM2835")
---
 drivers/spi/spi-bcm2835.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
index 5f8771fe1a31..775c0bf2f923 100644
--- a/drivers/spi/spi-bcm2835.c
+++ b/drivers/spi/spi-bcm2835.c
@@ -83,6 +83,7 @@ MODULE_PARM_DESC(polling_limit_us,
  * struct bcm2835_spi - BCM2835 SPI controller
  * @regs: base address of register map
  * @clk: core clock, divided to calculate serial clock
+ * @clk_hz: core clock cached speed
  * @irq: interrupt, signals TX FIFO empty or RX FIFO ¾ full
  * @tfr: SPI transfer currently processed
  * @ctlr: SPI controller reverse lookup
@@ -116,6 +117,7 @@ MODULE_PARM_DESC(polling_limit_us,
 struct bcm2835_spi {
 	void __iomem *regs;
 	struct clk *clk;
+	unsigned long clk_hz;
 	int irq;
 	struct spi_transfer *tfr;
 	struct spi_controller *ctlr;
@@ -1045,19 +1047,18 @@ static int bcm2835_spi_transfer_one(struct spi_controller *ctlr,
 {
 	struct bcm2835_spi *bs = spi_controller_get_devdata(ctlr);
 	struct bcm2835_spidev *slv = spi_get_ctldata(spi);
-	unsigned long spi_hz, clk_hz, cdiv;
+	unsigned long spi_hz, cdiv;
 	unsigned long hz_per_byte, byte_limit;
 	u32 cs = slv->prepare_cs;
 
 	/* set clock */
 	spi_hz = tfr->speed_hz;
-	clk_hz = clk_get_rate(bs->clk);
 
-	if (spi_hz >= clk_hz / 2) {
+	if (spi_hz >= bs->clk_hz / 2) {
 		cdiv = 2; /* clk_hz/2 is the fastest we can go */
 	} else if (spi_hz) {
 		/* CDIV must be a multiple of two */
-		cdiv = DIV_ROUND_UP(clk_hz, spi_hz);
+		cdiv = DIV_ROUND_UP(bs->clk_hz, spi_hz);
 		cdiv += (cdiv % 2);
 
 		if (cdiv >= 65536)
@@ -1065,7 +1066,7 @@ static int bcm2835_spi_transfer_one(struct spi_controller *ctlr,
 	} else {
 		cdiv = 0; /* 0 is the slowest we can go */
 	}
-	tfr->effective_speed_hz = cdiv ? (clk_hz / cdiv) : (clk_hz / 65536);
+	tfr->effective_speed_hz = cdiv ? (bs->clk_hz / cdiv) : (bs->clk_hz / 65536);
 	bcm2835_wr(bs, BCM2835_SPI_CLK, cdiv);
 
 	/* handle all the 3-wire mode */
@@ -1354,6 +1355,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
 		return bs->irq ? bs->irq : -ENODEV;
 
 	clk_prepare_enable(bs->clk);
+	bs->clk_hz = clk_get_rate(bs->clk);
 
 	err = bcm2835_dma_init(ctlr, &pdev->dev, bs);
 	if (err)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] spi: spi-bcm2835: Fix deadlock
  2021-07-16 21:02 ` [PATCH 1/1] " alexandru.tachici
@ 2021-07-19 23:50   ` Florian Fainelli
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Fainelli @ 2021-07-19 23:50 UTC (permalink / raw)
  To: alexandru.tachici, linux-kernel, linux-spi, broonie
  Cc: nuno.sa, bootc, swarren, bcm-kernel-feedback-list, rjui,
	f.fainelli, nsaenz

On 7/16/21 2:02 PM, alexandru.tachici@analog.com wrote:
> From: Alexandru Tachici <alexandru.tachici@analog.com>
> 
> The bcm2835_spi_transfer_one function can create a deadlock
> if it is called while another thread already has the
> CCF lock.
> 
> Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
> Fixes: f8043872e796 ("spi: add driver for BCM2835")

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] spi: spi-bcm2835: Fix deadlock
  2021-07-16 21:02 [PATCH 0/1] spi: spi-bcm2835: Fix deadlock alexandru.tachici
  2021-07-16 21:02 ` [PATCH 1/1] " alexandru.tachici
@ 2021-07-20 12:33 ` Mark Brown
  2021-07-20 18:48 ` Mark Brown
  2 siblings, 0 replies; 7+ messages in thread
From: Mark Brown @ 2021-07-20 12:33 UTC (permalink / raw)
  To: alexandru.tachici
  Cc: linux-kernel, linux-spi, nuno.sa, bootc, swarren,
	bcm-kernel-feedback-list, rjui, f.fainelli, nsaenz

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On Sat, Jul 17, 2021 at 12:02:44AM +0300, alexandru.tachici@analog.com wrote:
> From: Alexandru Tachici <alexandru.tachici@analog.com>
> 
> The bcm2835_spi_transfer_one function can create a deadlock
> if it is called while another thread already has the
> CCF lock.

Please don't send cover letters for single patches, if there is anything
that needs saying put it in the changelog of the patch or after the ---
if it's administrative stuff.  This reduces mail volume and ensures that 
any important information is recorded in the changelog rather than being
lost. 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] spi: spi-bcm2835: Fix deadlock
  2021-07-16 21:02 [PATCH 0/1] spi: spi-bcm2835: Fix deadlock alexandru.tachici
  2021-07-16 21:02 ` [PATCH 1/1] " alexandru.tachici
  2021-07-20 12:33 ` [PATCH 0/1] " Mark Brown
@ 2021-07-20 18:48 ` Mark Brown
  2021-07-21  6:47   ` Sa, Nuno
  2 siblings, 1 reply; 7+ messages in thread
From: Mark Brown @ 2021-07-20 18:48 UTC (permalink / raw)
  To: alexandru.tachici, linux-kernel, linux-spi
  Cc: Mark Brown, nsaenz, f.fainelli, rjui, swarren,
	bcm-kernel-feedback-list, bootc, nuno.sa

On Sat, 17 Jul 2021 00:02:44 +0300, alexandru.tachici@analog.com wrote:
> The bcm2835_spi_transfer_one function can create a deadlock
> if it is called while another thread already has the
> CCF lock.
> 
> This behavior was observed at boot and when trying to
> print the clk_summary debugfs. I had registered
> at the time multiple clocks of AD9545 through the CCF.
> Tested this using an RPi 4 connected to AD9545 through SPI.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next

Thanks!

[1/1] spi: spi-bcm2835: Fix deadlock
      commit: c45c1e82bba130db4f19d9dbc1deefcf4ea994ed

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 0/1] spi: spi-bcm2835: Fix deadlock
  2021-07-20 18:48 ` Mark Brown
@ 2021-07-21  6:47   ` Sa, Nuno
  2021-07-21 12:32     ` Mark Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Sa, Nuno @ 2021-07-21  6:47 UTC (permalink / raw)
  To: Mark Brown, Tachici, Alexandru, linux-kernel, linux-spi
  Cc: nsaenz, f.fainelli, rjui, swarren, bcm-kernel-feedback-list, bootc

Hi all,

> From: Mark Brown <broonie@kernel.org>
> Sent: Tuesday, July 20, 2021 8:48 PM
> To: Tachici, Alexandru <Alexandru.Tachici@analog.com>; linux-
> kernel@vger.kernel.org; linux-spi@vger.kernel.org
> Cc: Mark Brown <broonie@kernel.org>; nsaenz@kernel.org;
> f.fainelli@gmail.com; rjui@broadcom.com; swarren@wwwdotorg.org;
> bcm-kernel-feedback-list@broadcom.com; bootc@bootc.net; Sa,
> Nuno <Nuno.Sa@analog.com>
> Subject: Re: [PATCH 0/1] spi: spi-bcm2835: Fix deadlock
> 
> On Sat, 17 Jul 2021 00:02:44 +0300, alexandru.tachici@analog.com
> wrote:
> > The bcm2835_spi_transfer_one function can create a deadlock
> > if it is called while another thread already has the
> > CCF lock.
> >
> > This behavior was observed at boot and when trying to
> > print the clk_summary debugfs. I had registered
> > at the time multiple clocks of AD9545 through the CCF.
> > Tested this using an RPi 4 connected to AD9545 through SPI.
> >
> > [...]
> 
> Applied to
> 
> 
> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/k
> ernel/git/broonie/spi.git__;!!A3Ni8CS0y2Y!sBvE9XdQTgcPnOamJTAcY8
> 6Pjg5Cv-t1aDGASU9IO-JQeIPDBf5TBud6qV26eQ$  for-next
> 
> Thanks!
> 
> [1/1] spi: spi-bcm2835: Fix deadlock
>       commit: c45c1e82bba130db4f19d9dbc1deefcf4ea994ed
> 
> All being well this means that it will be integrated into the linux-next
> tree (usually sometime in the next 24 hours) and sent to Linus during
> the next merge window (or sooner if it is a bug fix), however if
> problems are discovered then the patch may be dropped or reverted.
> 
> You may get further e-mails resulting from automated or manual
> testing
> and review of the tree, please engage with people reporting problems
> and
> send followup patches addressing any issues that are reported if
> needed.
> 
> If any updates are required or you are submitting further changes they
> should be sent as incremental updates against current git, existing
> patches will not be replaced.
> 
> Please add any relevant lists and maintainers to the CCs when replying
> to this mail.
> 
> Thanks,
> Mark

I'm really curious about this one and how should we proceed. Maybe this is not
new (just to me) and the way to go is just to "fix" the spi controller when we hit the
issue? I'm asking this because there's a more fundamental problem when this pieces
align together (CCF + SPI). What I mean is that this can potentially happen in every
system that happens to have a spi based clock provider and in which the spi controller
tries to access the CCF in the spi transfer function... Doing a quick and short look I can
already see that [1], [2], [3] and [4] could hit the same deadlock...


Honestly, I'm not sure what is the fix here since when we look individually at the pieces
(CCF, SPI, SPI controller) there's nothing really wrong. The problem is when combined
together... My naive thinking is that having something like 'spi_sync_nodefer();' would
be a way to prevent this (or just changing 'spi_sync()' so that it can never defer the
msg to the spi thread).

Looking alone to ' __spi_pump_messages()' I can see that this probably not trivial though...

[1]: https://elixir.bootlin.com/linux/v5.14-rc2/source/drivers/spi/spi-tegra20-slink.c#L686
[2]: https://elixir.bootlin.com/linux/latest/source/drivers/spi/spi-sun6i.c#L353
[3]: https://elixir.bootlin.com/linux/latest/source/drivers/spi/spi-sun4i.c#L271
[4]: https://elixir.bootlin.com/linux/latest/source/drivers/spi/spi-qcom-qspi.c#L237

- Nuno Sá


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/1] spi: spi-bcm2835: Fix deadlock
  2021-07-21  6:47   ` Sa, Nuno
@ 2021-07-21 12:32     ` Mark Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Brown @ 2021-07-21 12:32 UTC (permalink / raw)
  To: Sa, Nuno
  Cc: Tachici, Alexandru, linux-kernel, linux-spi, nsaenz, f.fainelli,
	rjui, swarren, bcm-kernel-feedback-list, bootc

[-- Attachment #1: Type: text/plain, Size: 1549 bytes --]

On Wed, Jul 21, 2021 at 06:47:01AM +0000, Sa, Nuno wrote:
> > To: Tachici, Alexandru <Alexandru.Tachici@analog.com>; linux-
> > kernel@vger.kernel.org; linux-spi@vger.kernel.org
> > Cc: Mark Brown <broonie@kernel.org>; nsaenz@kernel.org;
> > f.fainelli@gmail.com; rjui@broadcom.com; swarren@wwwdotorg.org;
> > bcm-kernel-feedback-list@broadcom.com; bootc@bootc.net; Sa,
> > Nuno <Nuno.Sa@analog.com>
> > Subject: Re: [PATCH 0/1] spi: spi-bcm2835: Fix deadlock

Please delete unneeded context from mails when replying.  Doing this
makes it much easier to find your reply in the message, helping ensure
it won't be missed by people scrolling through the irrelevant quoted
material.

> I'm really curious about this one and how should we proceed. Maybe this is not
> new (just to me) and the way to go is just to "fix" the spi controller when we hit the
> issue? I'm asking this because there's a more fundamental problem when this pieces
> align together (CCF + SPI). What I mean is that this can potentially happen in every
> system that happens to have a spi based clock provider and in which the spi controller
> tries to access the CCF in the spi transfer function... Doing a quick and short look I can
> already see that [1], [2], [3] and [4] could hit the same deadlock...

The clock API just doesn't work very well for things on buses that might
sleep, I2C is another example - it's a long standing general issue that
needs to be addressed in the clock framework for example with finer
grained locking but nobody has come up with anything yet.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-07-21 12:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16 21:02 [PATCH 0/1] spi: spi-bcm2835: Fix deadlock alexandru.tachici
2021-07-16 21:02 ` [PATCH 1/1] " alexandru.tachici
2021-07-19 23:50   ` Florian Fainelli
2021-07-20 12:33 ` [PATCH 0/1] " Mark Brown
2021-07-20 18:48 ` Mark Brown
2021-07-21  6:47   ` Sa, Nuno
2021-07-21 12:32     ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).