All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Théo Lebrun" <theo.lebrun@bootlin.com>
To: "Miquel Raynal" <miquel.raynal@bootlin.com>
Cc: "Mark Brown" <broonie@kernel.org>,
	"Apurva Nandan" <a-nandan@ti.com>, "Dhruva Gole" <d-gole@ti.com>,
	<linux-spi@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	"Gregory CLEMENT" <gregory.clement@bootlin.com>,
	"Vladimir Kondratiev" <vladimir.kondratiev@mobileye.com>,
	"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
	"Tawfik Bayouk" <tawfik.bayouk@mobileye.com>
Subject: Re: [PATCH] spi: cadence-qspi: stop calling system-wide PM helpers for runtime PM
Date: Mon, 05 Feb 2024 11:03:44 +0100	[thread overview]
Message-ID: <CYX260CKXOUN.2H1DC1TG1Q6TY@bootlin.com> (raw)
In-Reply-To: <20240205100312.6f0f40db@xps-13>

Hi,

On Mon Feb 5, 2024 at 10:03 AM CET, Miquel Raynal wrote:
> Hello Théo,
>
> theo.lebrun@bootlin.com wrote on Fri, 02 Feb 2024 18:29:40 +0100:
>
> > The ->runtime_suspend() and ->runtime_resume() callbacks are not
> > expected to call spi_controller_suspend() and spi_controller_resume().
> > Remove calls to those in the cadence-qspi driver.
> > 
> > Those helpers have two roles currently:
> >  - They stop/start the queue, including dealing with the kworker.
> >  - They toggle the SPI controller SPI_CONTROLLER_SUSPENDED flag. It
> >    requires acquiring ctlr->bus_lock_mutex.
> > 
> > The cadence-qspi ->exec_op() implementation bumps the usage counter at
> > its start. It might therefore run our ->runtime_resume()
> > implementation. However, ctlr->bus_lock_mutex is acquired by
> > spi_mem_exec_op() while ->exec_op() is being called.
> > 
> > Here is a brief call tree highlighting the issue:
> > 
> > spi_mem_exec_op()
> >         ...
> >         spi_mem_access_start()
> >                 mutex_lock(&ctlr->bus_lock_mutex)
> > 
> >         cqspi_exec_mem_op()
> >                 pm_runtime_resume_and_get()
> >                         cqspi_resume()
> >                                 spi_controller_resume()
> >                                         mutex_lock(&ctlr->bus_lock_mutex)
> >                 ...
> > 
> >         spi_mem_access_end()
> >                 mutex_unlock(&ctlr->bus_lock_mutex)
> >         ...
> > 
> > The fatal conclusion of this is a deadlock: we acquire a lock on each
> > operation but while running the operation, we might want to runtime
> > resume and acquire the same lock.
> > 
> > Anyway, those helpers (spi_controller_{suspend,resume}) are aimed at
> > system-wide suspend and resume and should NOT be called at runtime
> > suspend & resume.
> > 
> > Side note: the previous implementation had a second issue. It acquired a
> > pointer to both `struct cqspi_st` and `struct spi_controller` using
> > dev_get_drvdata(). Neither embed the other. This lead to memory
> > corruption that was being hidden inside the big cqspi->f_pdata array on
> > my setup. It was working until I tried changing the array side to its
> > theorical max of 4, which lead to the discovery of this gnarly bug.
> > 
> > Fixes: 0578a6dbfe75 ("spi: spi-cadence-quadspi: add runtime pm support")
> > Fixes: 2087e85bb66e ("spi: cadence-quadspi: fix suspend-resume implementations")
>
> Your commit log makes total sense but I believe the diff is gonna break
> again the suspend to RAM operation. This is only my understanding
> right after quickly going through the whole story, so maybe I'm
> totally off topic.

The current ->runtime_suspend() implementation would indeed (probably)
work for suspend-to-RAM if it wasn't for the wrong pointers to cqspi
and spi_controller (see side note from commit message).

I've not found a moment where `struct cqspi_st` embed `struct
spi_controller` at its start, so I do not believe this has ever worked.
It might be the result of a mistake while porting a patch from a branch
that included other changes.

> What happened if I understand the two commits blamed above:
>
> - There were PM hooks.
> - Someone turned them into runtime PM hooks (breaking regular
>   suspend/resume).
> - Someone else added the "missing" suspend/resume logic inside the
>   runtime PM hooks to fix suspend and resume.
> - You are removing this logic because it leads to deadlocks.
>
> There was likely a misconception of what is expected in both cases
> (quick and small power savings vs. full power cycle/loosing the whole
> configuration).
>
> I would propose instead to create two distinct set of functions:
> - One for runtime PM
> - One for suspend/resume
> This way the runtime PM no longer deadlocks and people using
> suspend/resume won't get affected? I don't know if your runtime hooks
> *will* always be called during a suspend/resume. I hope so, which would
> make the split quite easy and without any code duplication.

That does indeed sound like the right approach. Runtime hooks can be
called from suspend/resume if needs be. Runtime PM then gets disabled
at the late stage.

I do not believe currently system-wide suspend can be working.
spi_controller_{suspend,resume} are being called with a bogus pointer.
This makes me ask: should the system-wide suspend/resume part be
addressed with this patch or a follow-up? It feels like a separate
concern to me.

The nice thing is that I have easy access to J7200, which uses the same
controller and supports suspend-to-RAM. That should make it a good test
setup.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

  reply	other threads:[~2024-02-05 10:03 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02 17:29 [PATCH] spi: cadence-qspi: stop calling system-wide PM helpers for runtime PM Théo Lebrun
2024-02-05  9:03 ` Miquel Raynal
2024-02-05 10:03   ` Théo Lebrun [this message]
2024-02-05 10:12     ` Miquel Raynal
2024-02-05 10:38       ` Dhruva Gole

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CYX260CKXOUN.2H1DC1TG1Q6TY@bootlin.com \
    --to=theo.lebrun@bootlin.com \
    --cc=a-nandan@ti.com \
    --cc=broonie@kernel.org \
    --cc=d-gole@ti.com \
    --cc=gregory.clement@bootlin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-spi@vger.kernel.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=tawfik.bayouk@mobileye.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vladimir.kondratiev@mobileye.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.