linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Klimov <alexey.klimov@linaro.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: draszik@google.com, peter.griffin@linaro.org,
	willmcvicker@google.com, mingo@kernel.org,
	ulf.hansson@linaro.org, tony@atomide.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	axboe@kernel.dk, alim.akhtar@samsung.com,
	regressions@lists.linux.dev, avri.altman@wdc.com,
	bvanassche@acm.org, klimova@google.com
Subject: Re: [REGRESSION] CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck reboots
Date: Fri, 17 Mar 2023 02:11:25 +0000	[thread overview]
Message-ID: <CANgGJDpd4Gm5HhQW__oMAv1yUqSPZ7FSGoQLYTmug=TUk4cn4g@mail.gmail.com> (raw)
In-Reply-To: <20230315111606.GB2006103@hirez.programming.kicks-ass.net>

On Wed, 15 Mar 2023 at 11:16, Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> (could you wrap your email please)

Ouch. Sorry.

> On Tue, Mar 14, 2023 at 11:00:04PM +0000, Alexey Klimov wrote:
> > #regzbot introduced: 0c5ffc3d7b15 #regzbot title:
> > CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck
> > reboots
> >
> > The upstream changes are being merged into android-mainline repo and
> > at some point we started to observe kernel panics on reboot or long
> > reboot times.
>
> On what hardware? I find it somewhat hard to follow this DT code :/

Pixel 6.

> > Looks like adding CPUIDLE_FLAG_RCU_IDLE flag to idle driver caused
> > this behaviour.  The minimal change that is required for this system
> > to avoid the regression would be one liner that removes the flag
> > (below).
> >
> > But if it is a real regression, then other idle drivers if used will
> > likely cause this regression too withe same ufshcd driver. There is
> > also a suspicion that CPUIDLE_FLAG_RCU_IDLE just revealed or uncovered
> > some other problem.
> >
> > Any thoughts on this?
>
> So ARM has a weird 'rule' in that idle state 0 (wfi) should not have
> RCU_IDLE set, while others should have.
>
> Of the dt_init_idle_driver() users:
>
>  - cpuidle-arm: arm_enter_idle_state()
>  - cpuidle-big_little: bl_enter_powerdown() does ct_cpuidle_{enter,exit}()
>  - cpuidle-psci: psci_enter_idle_state() uses CPU_PM_CPU_IDLE_ENTER_PARAM_RCU()
>  - cpuidle-qcom-spm: spm_enter_idle_state() uses CPU_PM_CPU_IDLE_ENTER_PARAM()
>  - cpuidle-riscv-sbi: sbi_cpuidle_enter_state() uses CPU_PM_CPU_IDLE_ENTER_*_PARAM()
>
> All of them start on index 1 and hence should have RCU_IDLE set, but at
> least the arm, qcom-spm and riscv-sbi don't actually appear to abide by
> the rules.
>
> Fixing that gives me the below; does that help?

Double-checked and it seems, unfortunately, the patch doesn't change
the behaviour at all.
The first problematic driver is ufshcd that slows down the reboot the most.
The another one is wlan bcm driver which callback is called from
blocking_notifier_call_chain(...).
Backtraces from it, when it is stuck/slow, involve pci and net
subsystems but I didn't yet narrow it
down to exact function or specific flow.
The patch from Bart helps with ufshcd driver but still reboot times
are 10-20 seconds.
The removing of RCU IDLE flag helps with both drivers.

Is there any debug data I can collect to help with this or any other
patches to test please?

Thanks,
Alexey

  reply	other threads:[~2023-03-17  2:11 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 23:00 [REGRESSION] CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck reboots Alexey Klimov
2023-03-14 23:21 ` Bart Van Assche
2023-03-17  1:38   ` Alexey Klimov
2023-03-15 11:16 ` Peter Zijlstra
2023-03-17  2:11   ` Alexey Klimov [this message]
2023-03-20  9:05     ` Peter Zijlstra
2023-03-20  9:36       ` Peter Zijlstra
2023-04-11 16:16         ` Alexey Klimov
2023-03-20  9:22   ` Peter Zijlstra
2023-03-20 13:52 ` Mark Rutland
2023-03-20 16:04   ` Mark Rutland
2023-04-02 12:40     ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANgGJDpd4Gm5HhQW__oMAv1yUqSPZ7FSGoQLYTmug=TUk4cn4g@mail.gmail.com' \
    --to=alexey.klimov@linaro.org \
    --cc=alim.akhtar@samsung.com \
    --cc=avri.altman@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=draszik@google.com \
    --cc=klimova@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peter.griffin@linaro.org \
    --cc=peterz@infradead.org \
    --cc=regressions@lists.linux.dev \
    --cc=tony@atomide.com \
    --cc=ulf.hansson@linaro.org \
    --cc=willmcvicker@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).