From: Arnd Bergmann <arnd@arndb.de> To: Tony Lindgren <tony@atomide.com> Cc: Yegor Yefremov <yegorslists@googlemail.com>, Ard Biesheuvel <ardb@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Linux-OMAP <linux-omap@vger.kernel.org>, linux-clk <linux-clk@vger.kernel.org>, Stephen Boyd <sboyd@kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org> Subject: Re: am335x: 5.18.x: system stalling Date: Thu, 12 May 2022 10:14:15 +0200 [thread overview] Message-ID: <CAK8P3a3817c8JMd=vqCjmY_kvBshhzSetgMfEihZ-NdcVZgJpQ@mail.gmail.com> (raw) In-Reply-To: <Ynyd9HeFNmGQiovY@atomide.com> On Thu, May 12, 2022 at 7:41 AM Tony Lindgren <tony@atomide.com> wrote: > Adding Ard and Arnd for vmap stack. Thanks! > * Yegor Yefremov <yegorslists@googlemail.com> [220511 14:16]: > > On Thu, May 5, 2022 at 7:08 AM Tony Lindgren <tony@atomide.com> wrote: > > > * Yegor Yefremov <yegorslists@googlemail.com> [220504 10:35]: > > Maybe Ard and Arnd have some ideas what might be going wrong here. > Basically anything trying to use a physical address on stack will > fail in weird ways like we've seen for smc and wl1251. For this, the first step should be to enable CONFIG_DMA_API_DEBUG. If any device is getting the wrong DMA address for a stack variable, this should print a helpful debug message to the console. > > > > [ 88.408578] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > [ 88.415777] (detected by 0, t=2602 jiffies, g=2529, q=17) > > > > [ 88.422026] rcu: All QSes seen, last rcu_sched kthread activity > > > > 2602 (-21160--23762), jiffies_till_next_fqs=1, root ->qsmask 0x0 > > > > [ 88.434445] rcu: rcu_sched kthread starved for 2602 jiffies! g2529 > > > > f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > > > > [ 88.445274] rcu: Unless rcu_sched kthread gets sufficient CPU > > > > time, OOM is now expected behavior. > > > > [ 88.454859] rcu: RCU grace-period kthread stack dump: I looked for a smoking gun in the backtrace, didn't really find anything, so I'm guessing the problem is something that happened between the last timer timer and the time it actually ran the rcu_gp_kthread, maybe some DMA timeout in a device driver running with interrupts disabled. > > > > [ 88.807588] omap3_noncore_dpll_program from clk_change_rate+0x23c/0x4f8 > > > > [ 88.815375] clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c > > > > [ 88.822936] clk_core_set_rate_nolock from clk_set_rate+0x30/0x64 > > > > [ 88.830056] clk_set_rate from _set_opp+0x254/0x51c > > > > [ 88.835835] _set_opp from dev_pm_opp_set_rate+0xec/0x228 > > > > [ 88.842073] dev_pm_opp_set_rate from __cpufreq_driver_target+0x584/0x700 > > > > [ 88.849792] __cpufreq_driver_target from od_dbs_update+0xb4/0x168 > > > > [ 88.856953] od_dbs_update from dbs_work_handler+0x2c/0x60 > > > > [ 88.863441] dbs_work_handler from process_one_work+0x284/0x72c > > > > [ 88.870411] process_one_work from worker_thread+0x28/0x4b0 > > > > [ 88.876973] worker_thread from kthread+0xe4/0x104 > > > > [ 88.882692] kthread from ret_from_fork+0x14/0x28 The only thing I see that is slightly unusual here is that the timer tick happened exactly during the cpufreq transition. Is this always the same backtrace when you run into the bug? What happens when you disable the omap3 cpufreq driver or set it to run at a fixed frequency? Arnd
WARNING: multiple messages have this Message-ID (diff)
From: Arnd Bergmann <arnd@arndb.de> To: Tony Lindgren <tony@atomide.com> Cc: Yegor Yefremov <yegorslists@googlemail.com>, Ard Biesheuvel <ardb@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Linux-OMAP <linux-omap@vger.kernel.org>, linux-clk <linux-clk@vger.kernel.org>, Stephen Boyd <sboyd@kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org> Subject: Re: am335x: 5.18.x: system stalling Date: Thu, 12 May 2022 10:14:15 +0200 [thread overview] Message-ID: <CAK8P3a3817c8JMd=vqCjmY_kvBshhzSetgMfEihZ-NdcVZgJpQ@mail.gmail.com> (raw) In-Reply-To: <Ynyd9HeFNmGQiovY@atomide.com> On Thu, May 12, 2022 at 7:41 AM Tony Lindgren <tony@atomide.com> wrote: > Adding Ard and Arnd for vmap stack. Thanks! > * Yegor Yefremov <yegorslists@googlemail.com> [220511 14:16]: > > On Thu, May 5, 2022 at 7:08 AM Tony Lindgren <tony@atomide.com> wrote: > > > * Yegor Yefremov <yegorslists@googlemail.com> [220504 10:35]: > > Maybe Ard and Arnd have some ideas what might be going wrong here. > Basically anything trying to use a physical address on stack will > fail in weird ways like we've seen for smc and wl1251. For this, the first step should be to enable CONFIG_DMA_API_DEBUG. If any device is getting the wrong DMA address for a stack variable, this should print a helpful debug message to the console. > > > > [ 88.408578] rcu: INFO: rcu_sched detected stalls on CPUs/tasks: > > > > [ 88.415777] (detected by 0, t=2602 jiffies, g=2529, q=17) > > > > [ 88.422026] rcu: All QSes seen, last rcu_sched kthread activity > > > > 2602 (-21160--23762), jiffies_till_next_fqs=1, root ->qsmask 0x0 > > > > [ 88.434445] rcu: rcu_sched kthread starved for 2602 jiffies! g2529 > > > > f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > > > > [ 88.445274] rcu: Unless rcu_sched kthread gets sufficient CPU > > > > time, OOM is now expected behavior. > > > > [ 88.454859] rcu: RCU grace-period kthread stack dump: I looked for a smoking gun in the backtrace, didn't really find anything, so I'm guessing the problem is something that happened between the last timer timer and the time it actually ran the rcu_gp_kthread, maybe some DMA timeout in a device driver running with interrupts disabled. > > > > [ 88.807588] omap3_noncore_dpll_program from clk_change_rate+0x23c/0x4f8 > > > > [ 88.815375] clk_change_rate from clk_core_set_rate_nolock+0x1b0/0x29c > > > > [ 88.822936] clk_core_set_rate_nolock from clk_set_rate+0x30/0x64 > > > > [ 88.830056] clk_set_rate from _set_opp+0x254/0x51c > > > > [ 88.835835] _set_opp from dev_pm_opp_set_rate+0xec/0x228 > > > > [ 88.842073] dev_pm_opp_set_rate from __cpufreq_driver_target+0x584/0x700 > > > > [ 88.849792] __cpufreq_driver_target from od_dbs_update+0xb4/0x168 > > > > [ 88.856953] od_dbs_update from dbs_work_handler+0x2c/0x60 > > > > [ 88.863441] dbs_work_handler from process_one_work+0x284/0x72c > > > > [ 88.870411] process_one_work from worker_thread+0x28/0x4b0 > > > > [ 88.876973] worker_thread from kthread+0xe4/0x104 > > > > [ 88.882692] kthread from ret_from_fork+0x14/0x28 The only thing I see that is slightly unusual here is that the timer tick happened exactly during the cpufreq transition. Is this always the same backtrace when you run into the bug? What happens when you disable the omap3 cpufreq driver or set it to run at a fixed frequency? Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-05-12 8:14 UTC|newest] Thread overview: 115+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-05-04 10:35 am335x: 5.18.x: system stalling Yegor Yefremov 2022-05-05 5:08 ` Tony Lindgren 2022-05-11 14:16 ` Yegor Yefremov 2022-05-12 5:41 ` Tony Lindgren 2022-05-12 5:41 ` Tony Lindgren 2022-05-12 8:14 ` Arnd Bergmann [this message] 2022-05-12 8:14 ` Arnd Bergmann 2022-05-12 8:42 ` Arnd Bergmann 2022-05-12 8:42 ` Arnd Bergmann 2022-05-12 10:20 ` Yegor Yefremov 2022-05-12 10:20 ` Yegor Yefremov 2022-05-19 16:52 ` Yegor Yefremov 2022-05-19 16:52 ` Yegor Yefremov 2022-05-21 19:41 ` Arnd Bergmann 2022-05-21 19:41 ` Arnd Bergmann 2022-05-24 13:38 ` Yegor Yefremov 2022-05-24 13:38 ` Yegor Yefremov 2022-05-24 14:19 ` Tony Lindgren 2022-05-24 14:19 ` Tony Lindgren 2022-05-26 5:49 ` Yegor Yefremov 2022-05-26 5:49 ` Yegor Yefremov 2022-05-26 6:20 ` Tony Lindgren 2022-05-26 6:20 ` Tony Lindgren 2022-05-26 8:19 ` Ard Biesheuvel 2022-05-26 8:19 ` Ard Biesheuvel 2022-05-26 12:37 ` Yegor Yefremov 2022-05-26 12:37 ` Yegor Yefremov 2022-05-26 14:15 ` Arnd Bergmann 2022-05-26 14:15 ` Arnd Bergmann 2022-05-27 4:44 ` Yegor Yefremov 2022-05-27 4:44 ` Yegor Yefremov 2022-05-27 6:38 ` Arnd Bergmann 2022-05-27 6:38 ` Arnd Bergmann 2022-05-27 6:50 ` Tony Lindgren 2022-05-27 6:50 ` Tony Lindgren 2022-05-27 6:57 ` Arnd Bergmann 2022-05-27 6:57 ` Arnd Bergmann 2022-05-27 8:17 ` Yegor Yefremov 2022-05-27 8:17 ` Yegor Yefremov 2022-05-27 8:38 ` Arnd Bergmann 2022-05-27 8:38 ` Arnd Bergmann 2022-05-27 9:50 ` Yegor Yefremov 2022-05-27 9:50 ` Yegor Yefremov 2022-05-27 12:53 ` Arnd Bergmann 2022-05-27 12:53 ` Arnd Bergmann 2022-05-27 13:12 ` Ard Biesheuvel 2022-05-27 13:12 ` Ard Biesheuvel 2022-05-27 14:12 ` Arnd Bergmann 2022-05-27 14:12 ` Arnd Bergmann 2022-05-28 5:48 ` Yegor Yefremov 2022-05-28 5:48 ` Yegor Yefremov 2022-05-28 7:53 ` Arnd Bergmann 2022-05-28 7:53 ` Arnd Bergmann 2022-05-28 8:29 ` Yegor Yefremov 2022-05-28 8:29 ` Yegor Yefremov 2022-05-28 9:07 ` Ard Biesheuvel 2022-05-28 9:07 ` Ard Biesheuvel 2022-05-28 13:01 ` Yegor Yefremov 2022-05-28 13:01 ` Yegor Yefremov 2022-05-28 13:13 ` Arnd Bergmann 2022-05-28 13:13 ` Arnd Bergmann 2022-05-28 19:28 ` Yegor Yefremov 2022-05-28 19:28 ` Yegor Yefremov 2022-05-30 10:16 ` Ard Biesheuvel 2022-05-30 10:16 ` Ard Biesheuvel 2022-05-30 12:09 ` Yegor Yefremov 2022-05-30 12:09 ` Yegor Yefremov 2022-05-30 13:54 ` Arnd Bergmann 2022-05-30 13:54 ` Arnd Bergmann 2022-05-30 15:14 ` Ard Biesheuvel 2022-05-30 15:14 ` Ard Biesheuvel 2022-05-31 8:36 ` Yegor Yefremov 2022-05-31 8:36 ` Yegor Yefremov 2022-05-31 14:16 ` Yegor Yefremov 2022-05-31 14:16 ` Yegor Yefremov 2022-05-31 15:22 ` Arnd Bergmann 2022-05-31 15:22 ` Arnd Bergmann 2022-06-01 7:36 ` Yegor Yefremov 2022-06-01 7:36 ` Yegor Yefremov 2022-06-01 7:59 ` Arnd Bergmann 2022-06-01 7:59 ` Arnd Bergmann 2022-06-01 8:08 ` Ard Biesheuvel 2022-06-01 8:08 ` Ard Biesheuvel 2022-06-01 9:27 ` Ard Biesheuvel 2022-06-01 9:27 ` Ard Biesheuvel 2022-06-01 10:03 ` Yegor Yefremov 2022-06-01 10:03 ` Yegor Yefremov 2022-06-01 10:06 ` Ard Biesheuvel 2022-06-01 10:06 ` Ard Biesheuvel 2022-06-01 10:46 ` Yegor Yefremov 2022-06-01 10:46 ` Yegor Yefremov 2022-06-01 10:49 ` Ard Biesheuvel 2022-06-01 10:49 ` Ard Biesheuvel 2022-06-02 10:17 ` Yegor Yefremov 2022-06-02 10:17 ` Yegor Yefremov 2022-06-02 10:37 ` Ard Biesheuvel 2022-06-02 10:37 ` Ard Biesheuvel 2022-06-02 12:27 ` Yegor Yefremov 2022-06-02 12:27 ` Yegor Yefremov 2022-06-03 8:54 ` Yegor Yefremov 2022-06-03 8:54 ` Yegor Yefremov 2022-06-03 9:32 ` Arnd Bergmann 2022-06-03 9:32 ` Arnd Bergmann 2022-06-03 19:11 ` Yegor Yefremov 2022-06-03 19:11 ` Yegor Yefremov 2022-06-03 20:46 ` Arnd Bergmann 2022-06-03 20:46 ` Arnd Bergmann 2022-06-05 14:59 ` Ard Biesheuvel 2022-06-05 14:59 ` Ard Biesheuvel 2022-06-07 8:55 ` Yegor Yefremov 2022-06-07 8:55 ` Yegor Yefremov 2022-08-12 7:35 ` Arnd Bergmann 2022-08-12 7:35 ` Arnd Bergmann 2022-05-24 14:36 ` Arnd Bergmann 2022-05-24 14:36 ` Arnd Bergmann
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAK8P3a3817c8JMd=vqCjmY_kvBshhzSetgMfEihZ-NdcVZgJpQ@mail.gmail.com' \ --to=arnd@arndb.de \ --cc=ardb@kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-clk@vger.kernel.org \ --cc=linux-omap@vger.kernel.org \ --cc=sboyd@kernel.org \ --cc=tony@atomide.com \ --cc=yegorslists@googlemail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.