linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jon Hunter <jonathanh@nvidia.com>
To: <kernel@martin.sperl.org>
Cc: Mark Brown <broonie@kernel.org>,
	linux-tegra <linux-tegra@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	<linux-spi@vger.kernel.org>
Subject: Re: Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed
Date: Wed, 16 Jan 2019 10:58:16 +0000	[thread overview]
Message-ID: <7ffb7c42-a7e1-edf2-2d0e-86d2240b5c9e@nvidia.com> (raw)
In-Reply-To: <AAA7943B-B1F1-4389-AAC3-8621EC6E38B8@martin.sperl.org>


On 15/01/2019 17:39, kernel@martin.sperl.org wrote:
> Hi John!
> 
>> On 15.01.2019, at 15:26, Jon Hunter <jonathanh@nvidia.com> wrote:
>>> Looks as if there is something missing in spi_stop_queue that 
>>> would wake the worker thread one last time without any delays
>>> and finish the hw shutdown immediately - it runs as a delayed
>>> task...
>>>
>>> One question: do you run any spi transfers in
>>> your test case before suspend?
>>
>> No and before suspending I dumped some of the spi stats and I see no
>> tranfers/messages at all ...
>>
>> Stats for spi1 ...
>> Bytes: 0
>> Errors: 0
>> Messages: 0
>> Transfers: 0
> 
> This...
> 
>>> /sys/class/spi_master/spi1/statistics/messages gives some
>>> counters on the number of spi messages processed which
>>> would give you an indication if that is happening.
>>>
>>> It could be as easy as adding right after the first lock 
>>> in spi_stop_queue:
>>> kthread_mod_delayed_work(&ctlr->kworker,
>>>  &ctlr->pump_idle_teardown, 0);
>>> (plus maybe a yield or similar to allow the worker to 
>>> quickly/reliably run on a single core machine)
>>>
>>> I hope that this initial guess helps.
>>
>> Unfortunately, the above did not help and the issue persists.
>>
>> Digging a bit deeper I see that now the 'ctlr->queue' is empty but
>> 'ctlr->busy' flag is set and this is causing the 'could not stop message
>> queue' error.
>>
>> It seems that __spi_pump_messages() is getting called several times
>> during boot when registering the spi-flash, then after the spi-flash has
>> been registered, about a 1 sec later spi_pump_idle_teardown() is called
>> (as expected), but exits because 'ctlr->running' is true. However,
> 
> and this contradicts each other!
> If there is a call to message pump, then we should process a message
> and the counters should increase.
> 
> If those counters do not increase then there is something strange.

Actually no, this is nothing strange here at all. If you look Tegra has
it's own 'transfer_one_message' handler (tegra_spi_transfer_one_message
in drivers/spi/spi-tegra114.c) and this does not update any of the
statistics. Looks like the statistics are not updated if a SPI
controller has it own transfer_one_message hander which is a shame.

> If we are called without anything to do then the pump should trigger a
> tear down and stop.

>> spi_pump_idle_teardown() is never called again and when we suspend we
>> are stuck in the busy/running state. In this case should something be
>> scheduling spi_pump_idle_teardown() again? Although even if it does I
>> don't see where the busy flag would be cleared in this path?
>>
> 
> I am wondering where busy/running would be set in the first place
> if there are no counters...

Looks like running is set true on initialising the queue and busy is set
true in __spi_pump_messages().

> Is it possible that the specific flash is not using the “normal” 
> spi_pump_message, but spi_controller_mem_ops operations? 

I definitely see __spi_pump_messages() being called which in turn calls
tegra_spi_transfer_one_message().

> Maybe we are missing the teardown in that driver or something is
> changing flags there.
> 
> grepping a bit:
> 
> I see spi_mem_access_start calling spi_flush_queue, which is calling
> pump_message, which - if there is no transfer - will trigger a delayed
> tear-down, while it maybe should not be doing that.
> 
> Maybe spi_mem_access_end needs a call to spi_flush_queue as well?
> 
> Unfortunately this is theoretical work and quite a lot of guesswork
> without an actual device available for testing...

Looking some more I think that the problem is ...

1. The controller is put into the 'running' state when spi_start_queue()
is called (which itself is called during the registration of the spi
controller).
2. After sending some messages to initialise the SPI flash,
spi_pump_idle_teardown() is called but exits early because the
controller is in the 'running' state and this prevents 'busy' from being
cleared.
3. On entering suspend spi_stop_queue() is called but the controller is
still in the 'busy' state and so fails to stop.

So the problem I see is that 'busy' is never cleared and before it was.
Furthermore, because 'busy' is never cleared, I see that the SPI
controller is never pm-runtime suspended and cannot be suspended, and so
this will have an impact on power.

With your patch after booting I see ...

$ cat /sys/class/spi_master/spi1/device/power/runtime_status
active

Reverting your patch, after booting I see ...

$ cat /sys/class/spi_master/spi1/device/power/runtime_status
suspended

Does the device you are testing with support runtime-pm? If so what do
you see?

Cheers
Jon

-- 
nvpublic

  parent reply	other threads:[~2019-01-16 10:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-14 15:35 Regression: spi: core: avoid waking pump thread from spi_sync instead run teardown delayed Jon Hunter
     [not found] ` <7C4A5EFC-8235-40C8-96E1-E6020529DF72@martin.sperl.org>
2019-01-15 14:26   ` Jon Hunter
2019-01-15 15:10     ` Mark Brown
2019-01-15 16:09       ` Jon Hunter
2019-01-15 19:27         ` Mark Brown
2019-01-15 17:39     ` kernel
2019-01-15 19:26       ` Mark Brown
2019-01-15 20:58         ` Martin Sperl
2019-01-15 21:25           ` Mark Brown
2019-01-16 11:01             ` Jon Hunter
2019-01-18 17:11             ` kernel
2019-01-18 19:12               ` Mark Brown
2019-01-20 11:24                 ` kernel
2019-01-23 17:56                   ` Mark Brown
2019-05-09 19:47                     ` Martin Sperl
2019-05-12  8:54                       ` Mark Brown
2019-01-16 10:58       ` Jon Hunter [this message]
2019-01-22  9:36 ` Geert Uytterhoeven
     [not found]   ` <CGME20190123082650eucas1p243ce21346a00e9b3e9eed2863cd3d280@eucas1p2.samsung.com>
2019-01-23  8:26     ` Marek Szyprowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ffb7c42-a7e1-edf2-2d0e-86d2240b5c9e@nvidia.com \
    --to=jonathanh@nvidia.com \
    --cc=broonie@kernel.org \
    --cc=kernel@martin.sperl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-spi@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).