linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Alexey Klimov <alexey.klimov@linaro.org>
Cc: draszik@google.com, peter.griffin@linaro.org,
	willmcvicker@google.com, mingo@kernel.org,
	ulf.hansson@linaro.org, tony@atomide.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	axboe@kernel.dk, alim.akhtar@samsung.com,
	regressions@lists.linux.dev, avri.altman@wdc.com,
	bvanassche@acm.org, klimova@google.com
Subject: Re: [REGRESSION] CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck reboots
Date: Mon, 20 Mar 2023 10:36:14 +0100	[thread overview]
Message-ID: <20230320093614.GB2196776@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20230320090558.GF2194297@hirez.programming.kicks-ass.net>

On Mon, Mar 20, 2023 at 10:05:58AM +0100, Peter Zijlstra wrote:
> On Fri, Mar 17, 2023 at 02:11:25AM +0000, Alexey Klimov wrote:
> > On Wed, 15 Mar 2023 at 11:16, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > >
> > > (could you wrap your email please)
> > 
> > Ouch. Sorry.
> > 
> > > On Tue, Mar 14, 2023 at 11:00:04PM +0000, Alexey Klimov wrote:
> > > > #regzbot introduced: 0c5ffc3d7b15 #regzbot title:
> > > > CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck
> > > > reboots
> > > >
> > > > The upstream changes are being merged into android-mainline repo and
> > > > at some point we started to observe kernel panics on reboot or long
> > > > reboot times.
> > >
> > > On what hardware? I find it somewhat hard to follow this DT code :/
> > 
> > Pixel 6.
> 
> What actual cpuidle driver is that thing using? Is there any out-of-tree
> code involved? Mark tells me anything arm64 should be using PSCI, so let
> me to stare hard at that again.

So specifically, your problem sounds like rcu_synchronize() is taking
very much longer than it used to. Specifically combined with the patch
that makes it 'go-away' this seems to indicate you lost a
ct_cpuidle_enter() call, which is what ends up telling RCU the cpu is
idle and no longer partakes in the whole grace period machinery. Not
telling RCU this results in RCU waiting for an idle cpu to report back
on it's RCU progress, but it being idle means it's not going to be doing
that and things sorta wait around until RCU gets fed up and starts
spraying IPIs to try and get things moving.


Now...  if a driver sets CPUIDLE_FLAG_RCU_IDLE it promises to call
ct_cpuidle_{enter,exit}() itself. Hence for any driver that does *NOT*
set that flag, cpuidle_enter_state() calls these functions.

Now, fo PSCI, the DT handler is psci_enter_idle_state(), which uses
CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(), which per the other email, means that
it's low_level_idle_enter := psci_cpu_suspend_enter(), *will* call
ct_cpuidle_{enter,exit}().

Then if we look at psci_cpu_suspend_enter(), it has two cases depending
on psci_power_state_loses_context(). If it doesn't lose context it does
ct_cpuidle_enter() right there and proceeds to call
psci_ops.cpu_suspend() -- whatever that does.

If it does lose state, then it depends on CONFIG_ARM64, on arm64 we do
not call ct_cpuidle_{enter,exit}() but proceed into cpu_suspend().

We can find that function in arch/arm64/kernel/suspend.c, and if you
look at it, you'll note it does in fact call ct_cpuidle_{enter,exit}()
as per promises made.

So AFAICT every path into idle will pass through ct_cpuidle_enter().




  reply	other threads:[~2023-03-20  9:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 23:00 [REGRESSION] CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck reboots Alexey Klimov
2023-03-14 23:21 ` Bart Van Assche
2023-03-17  1:38   ` Alexey Klimov
2023-03-15 11:16 ` Peter Zijlstra
2023-03-17  2:11   ` Alexey Klimov
2023-03-20  9:05     ` Peter Zijlstra
2023-03-20  9:36       ` Peter Zijlstra [this message]
2023-04-11 16:16         ` Alexey Klimov
2023-03-20  9:22   ` Peter Zijlstra
2023-03-20 13:52 ` Mark Rutland
2023-03-20 16:04   ` Mark Rutland
2023-04-02 12:40     ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230320093614.GB2196776@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=alexey.klimov@linaro.org \
    --cc=alim.akhtar@samsung.com \
    --cc=avri.altman@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=draszik@google.com \
    --cc=klimova@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peter.griffin@linaro.org \
    --cc=regressions@lists.linux.dev \
    --cc=tony@atomide.com \
    --cc=ulf.hansson@linaro.org \
    --cc=willmcvicker@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).