All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Théo Lebrun" <theo.lebrun@bootlin.com>
To: Mark Brown <broonie@kernel.org>, Apurva Nandan <a-nandan@ti.com>,
	 Dhruva Gole <d-gole@ti.com>
Cc: linux-spi@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Gregory CLEMENT" <gregory.clement@bootlin.com>,
	"Vladimir Kondratiev" <vladimir.kondratiev@mobileye.com>,
	"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
	"Tawfik Bayouk" <tawfik.bayouk@mobileye.com>,
	"Théo Lebrun" <theo.lebrun@bootlin.com>
Subject: [PATCH] spi: cadence-qspi: stop calling system-wide PM helpers for runtime PM
Date: Fri, 02 Feb 2024 18:29:40 +0100	[thread overview]
Message-ID: <20240202-cdns-qspi-pm-fix-v1-1-3c8feb2bfdd8@bootlin.com> (raw)

The ->runtime_suspend() and ->runtime_resume() callbacks are not
expected to call spi_controller_suspend() and spi_controller_resume().
Remove calls to those in the cadence-qspi driver.

Those helpers have two roles currently:
 - They stop/start the queue, including dealing with the kworker.
 - They toggle the SPI controller SPI_CONTROLLER_SUSPENDED flag. It
   requires acquiring ctlr->bus_lock_mutex.

The cadence-qspi ->exec_op() implementation bumps the usage counter at
its start. It might therefore run our ->runtime_resume()
implementation. However, ctlr->bus_lock_mutex is acquired by
spi_mem_exec_op() while ->exec_op() is being called.

Here is a brief call tree highlighting the issue:

spi_mem_exec_op()
        ...
        spi_mem_access_start()
                mutex_lock(&ctlr->bus_lock_mutex)

        cqspi_exec_mem_op()
                pm_runtime_resume_and_get()
                        cqspi_resume()
                                spi_controller_resume()
                                        mutex_lock(&ctlr->bus_lock_mutex)
                ...

        spi_mem_access_end()
                mutex_unlock(&ctlr->bus_lock_mutex)
        ...

The fatal conclusion of this is a deadlock: we acquire a lock on each
operation but while running the operation, we might want to runtime
resume and acquire the same lock.

Anyway, those helpers (spi_controller_{suspend,resume}) are aimed at
system-wide suspend and resume and should NOT be called at runtime
suspend & resume.

Side note: the previous implementation had a second issue. It acquired a
pointer to both `struct cqspi_st` and `struct spi_controller` using
dev_get_drvdata(). Neither embed the other. This lead to memory
corruption that was being hidden inside the big cqspi->f_pdata array on
my setup. It was working until I tried changing the array side to its
theorical max of 4, which lead to the discovery of this gnarly bug.

Fixes: 0578a6dbfe75 ("spi: spi-cadence-quadspi: add runtime pm support")
Fixes: 2087e85bb66e ("spi: cadence-quadspi: fix suspend-resume implementations")
Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
Hi,

This is a draft patch highlighting a serious bug in the
->runtime_suspend() and ->runtime_resume() implementations of
cadence-qspi. Seeing how runtime PM and autosuspend are enabled by
default, I believe this affects all users of the driver.

I've tried my best to be exhaustive in the commit message. Have I missed
something that could explain how the current implementations could have
been functional in the last few revisions of the kernel?

The MIPS platform at hand, used for debugging and testing, is currently
not supported by the driver. It is the Mobileye EyeQ5 [0]. No code
changes are required for support, only a new compatible and appropriate
match data + flags. That will come later, with some performance-related
patches.

Conclusion being: feedback from maintainers & others that know the
driver and subsystem would be useful to bring this forward.

Thanks all,
Théo

[0]: https://lore.kernel.org/lkml/20240118155252.397947-1-gregory.clement@bootlin.com/
---
 drivers/spi/spi-cadence-quadspi.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
index 74647dfcb86c..72f80c77ee35 100644
--- a/drivers/spi/spi-cadence-quadspi.c
+++ b/drivers/spi/spi-cadence-quadspi.c
@@ -1927,24 +1927,18 @@ static void cqspi_remove(struct platform_device *pdev)
 	pm_runtime_disable(&pdev->dev);
 }
 
-static int cqspi_suspend(struct device *dev)
+static int cqspi_runtime_suspend(struct device *dev)
 {
 	struct cqspi_st *cqspi = dev_get_drvdata(dev);
-	struct spi_controller *host = dev_get_drvdata(dev);
-	int ret;
 
-	ret = spi_controller_suspend(host);
 	cqspi_controller_enable(cqspi, 0);
-
 	clk_disable_unprepare(cqspi->clk);
-
-	return ret;
+	return 0;
 }
 
-static int cqspi_resume(struct device *dev)
+static int cqspi_runtime_resume(struct device *dev)
 {
 	struct cqspi_st *cqspi = dev_get_drvdata(dev);
-	struct spi_controller *host = dev_get_drvdata(dev);
 
 	clk_prepare_enable(cqspi->clk);
 	cqspi_wait_idle(cqspi);
@@ -1953,11 +1947,11 @@ static int cqspi_resume(struct device *dev)
 	cqspi->current_cs = -1;
 	cqspi->sclk = 0;
 
-	return spi_controller_resume(host);
+	return 0;
 }
 
-static DEFINE_RUNTIME_DEV_PM_OPS(cqspi_dev_pm_ops, cqspi_suspend,
-				 cqspi_resume, NULL);
+static DEFINE_RUNTIME_DEV_PM_OPS(cqspi_dev_pm_ops, cqspi_runtime_suspend,
+				 cqspi_runtime_resume, NULL);
 
 static const struct cqspi_driver_platdata cdns_qspi = {
 	.quirks = CQSPI_DISABLE_DAC_MODE,

---
base-commit: 27470aa9b51a348f7edfb99641b5a9004f81e3e6
change-id: 20240202-cdns-qspi-pm-fix-29600cc6d7bf

Best regards,
-- 
Théo Lebrun <theo.lebrun@bootlin.com>


             reply	other threads:[~2024-02-02 17:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02 17:29 Théo Lebrun [this message]
2024-02-05  9:03 ` [PATCH] spi: cadence-qspi: stop calling system-wide PM helpers for runtime PM Miquel Raynal
2024-02-05 10:03   ` Théo Lebrun
2024-02-05 10:12     ` Miquel Raynal
2024-02-05 10:38       ` Dhruva Gole

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240202-cdns-qspi-pm-fix-v1-1-3c8feb2bfdd8@bootlin.com \
    --to=theo.lebrun@bootlin.com \
    --cc=a-nandan@ti.com \
    --cc=broonie@kernel.org \
    --cc=d-gole@ti.com \
    --cc=gregory.clement@bootlin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-spi@vger.kernel.org \
    --cc=tawfik.bayouk@mobileye.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vladimir.kondratiev@mobileye.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.