From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Anson Huang <anson.huang@nxp.com>, Jacky Bai <ping.bai@nxp.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Linux PM <linux-pm@vger.kernel.org>
Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
Date: Thu, 21 Nov 2019 11:37:50 +0100 [thread overview]
Message-ID: <CAJZ5v0iqcU6A_tMNmxpcr3roEyw268fvM+tTUUg7fn9vALQwtg@mail.gmail.com> (raw)
In-Reply-To: <20191121093557.bycvdo4xyinbc5cb@vireshk-i7>
On Thu, Nov 21, 2019 at 10:36 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> +Rafael and PM list.
>
> Please provide output of following for your platform while I am having a look at
> your problem.
>
> grep . /sys/devices/system/cpu/cpufreq/*/*
>
> On 21-11-19, 09:17, Anson Huang wrote:
> > Hi, Viresh
> > Sorry to bother you directly via mail.
> > We met something wrong with cpufreq governor driver during CPU hot-plug stress
> > test on v5.4-rc7, below is the log, from debug, looks like the irq_work is
> > still pending on a CPU which is already offline,
I'm really not sure if this conclusion can be drawn from the log below.
> > so next CPU being on/off line
> > will call the cpufreq_stop_governor and it will wait for previous irq_work
> > free for use, but since it is pending on an offline CPU, so it will never
> > success. Do you have any idea of it or have any info about it? Thanks a lot in
> > advanced!
So this is just blocked __cpuhp_kick_ap() waiting forever AFAICS.
> >
> > [ 1062.437497] smp_test.sh D 0 584 477 0x00000200
> > [ 1062.442986] Call trace:
> > [ 1062.445445] __switch_to+0xb4/0x200
> > [ 1062.448937] __schedule+0x304/0x5b0
> > [ 1062.452423] schedule+0x40/0xd0
> > [ 1062.455565] schedule_timeout+0x16c/0x268
> > [ 1062.459574] wait_for_completion+0xa0/0x120
> > [ 1062.463758] __cpuhp_kick_ap+0x54/0x68
> > [ 1062.467506] cpuhp_kick_ap+0x38/0xa8
> > [ 1062.471079] bringup_cpu+0xbc/0xe8
> > [ 1062.474480] cpuhp_invoke_callback+0x88/0x1f8
> > [ 1062.478835] _cpu_up+0xe8/0x1e0
> > [ 1062.481975] do_cpu_up+0x98/0xb8
> > [ 1062.485202] cpu_up+0x10/0x18
> > [ 1062.488172] cpu_subsys_online+0x48/0xa0
> > [ 1062.492094] device_online+0x68/0xb0
> > [ 1062.495667] online_store+0xa8/0xb8
> > [ 1062.499157] dev_attr_store+0x14/0x28
> > [ 1062.502820] sysfs_kf_write+0x48/0x58
> > [ 1062.506480] kernfs_fop_write+0xe0/0x1f8
> > [ 1062.510404] __vfs_write+0x18/0x40
> > [ 1062.513806] vfs_write+0x19c/0x1f0
> > [ 1062.517206] ksys_write+0x64/0xf0
> > [ 1062.520520] __arm64_sys_write+0x14/0x20
> > [ 1062.524446] el0_svc_common.constprop.2+0xb0/0x168
> > [ 1062.529236] el0_svc_handler+0x20/0x80
> > [ 1062.532984] el0_svc+0x8/0xc
> > [ 1062.535868] kworker/0:2 D 0 5496 2 0x00000228
> > [ 1062.541361] Workqueue: events vmstat_shepherd
> > [ 1062.545717] Call trace:
> > [ 1062.548163] __switch_to+0xb4/0x200
> > [ 1062.551650] __schedule+0x304/0x5b0
> > [ 1062.555137] schedule+0x40/0xd0
> > [ 1062.558278] rwsem_down_read_slowpath+0x200/0x4c0
> > [ 1062.562984] __down_read+0x9c/0xc0
> > [ 1062.566385] __percpu_down_read+0x6c/0xd8
> > [ 1062.570393] cpus_read_lock+0x70/0x78
> > [ 1062.574054] vmstat_shepherd+0x30/0xd0
> > [ 1062.577804] process_one_work+0x1dc/0x370
> > [ 1062.581812] worker_thread+0x48/0x468
> > [ 1062.585475] kthread+0xf0/0x120
> > [ 1062.588615] ret_from_fork+0x10/0x1c
> >
> > [ 1311.095934] sysrq: Show backtrace of all active CPUs
> > [ 1311.100913] sysrq: CPU0:
> > [ 1311.103450] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7-03224-g9b510305bd68-dirty #28
> > [ 1311.111972] Hardware name: NXP i.MX8MNano DDR4 EVK board (DT)
> > [ 1311.117717] pstate: 40000005 (nZcv daif -PAN -UAO)
> > [ 1311.122519] pc : arch_cpu_idle+0x10/0x18
> > [ 1311.126445] lr : default_idle_call+0x18/0x30
> > [ 1311.130712] sp : ffff800011f73ec0
> > [ 1311.134024] x29: ffff800011f73ec0 x28: 0000000041dd0018
> > [ 1311.139335] x27: 0000000000000400 x26: 0000000000000000
> > [ 1311.144646] x25: 0000000000000000 x24: ffff800011f7a220
> > [ 1311.149956] x23: ffff800011f79000 x22: ffff800011b42bf8
> > [ 1311.155267] x21: ffff800011f7a000 x20: 0000000000000001
> > [ 1311.160578] x19: ffff800011f7a120 x18: 0000000000000000
> > [ 1311.165888] x17: 0000000000000000 x16: 0000000000000000
> > [ 1311.171198] x15: 0000000000000000 x14: 0000000000000000
> > [ 1311.176508] x13: 0000000000000001 x12: 0000000000000000
> > [ 1311.181819] x11: 0000000000000000 x10: 0000000000000990
> > [ 1311.187129] x9 : ffff800011f73e10 x8 : ffff800011f83ef0
> > [ 1311.192440] x7 : ffff000069b8f6c0 x6 : ffff000069b8ad70
> > [ 1311.197750] x5 : 0000013165fb6700 x4 : 4000000000000000
> > [ 1311.203061] x3 : ffff800011f73eb0 x2 : 0000000000000000
> > [ 1311.208371] x1 : 0000000000095e94 x0 : 00000000000000e0
> > [ 1311.213682] Call trace:
> > [ 1311.216131] arch_cpu_idle+0x10/0x18
> > [ 1311.219708] do_idle+0x1c4/0x290
> > [ 1311.222935] cpu_startup_entry+0x24/0x40
> > [ 1311.226856] rest_init+0xd4/0xe0
> > [ 1311.230087] arch_call_rest_init+0xc/0x14
> > [ 1311.234094] start_kernel+0x430/0x45c
> > [ 1311.243556] sysrq: CPU1:
> > [ 1311.246099] Call trace:
> > [ 1311.248557] dump_backtrace+0x0/0x158
> > [ 1311.252220] show_stack+0x14/0x20
> > [ 1311.255537] showacpu+0x70/0x80
> > [ 1311.258682] flush_smp_call_function_queue+0x74/0x150
> > [ 1311.263733] generic_smp_call_function_single_interrupt+0x10/0x18
> > [ 1311.269831] handle_IPI+0x138/0x168
> > [ 1311.273319] gic_handle_irq+0x144/0x148
> > [ 1311.277153] el1_irq+0xb8/0x180
> > [ 1311.280298] irq_work_sync+0x10/0x18
> > [ 1311.283875] cpufreq_stop_governor.part.20+0x1c/0x30
> > [ 1311.288839] cpufreq_online+0x5a0/0x860
And here you have cpufreq_online() calling cpufreq_stop_governor().
The only way that can happen is through cpufreq_add_policy_cpu()
AFAICS, because otherwise policy->governor would be NULL.
So cpufreq_stop_governor() is called from cpufreq_add_policy_cpu() and
it invokes the irq_work_sync() through the governor ->stop callback.
Now, the target irq_work can only be pending on an *online* CPU
sharing the policy with the one in question, so here the CPU going
online is waiting on the other CPU to run the irq_work, but for some
reason it cannot do that.
Note that this is after clearing the update_util pointer for all CPUs
sharing the policy and running synchronize_rcu(), so the irq_work must
have been queued earlier.
> > [ 1311.292673] cpuhp_cpufreq_online+0xc/0x18
> > [ 1311.296771] cpuhp_invoke_callback+0x88/0x1f8
> > [ 1311.301127] cpuhp_thread_fun+0xd8/0x160
> > [ 1311.305051] smpboot_thread_fn+0x200/0x2a8
> > [ 1311.309151] kthread+0xf0/0x120
> > [ 1311.312291] ret_from_fork+0x10/0x1c
prev parent reply other threads:[~2019-11-21 10:38 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <DB3PR0402MB391626A8ECFDC182C6EDCF8DF54E0@DB3PR0402MB3916.eurprd04.prod.outlook.com>
2019-11-21 9:35 ` About CPU hot-plug stress test failed in cpufreq driver Viresh Kumar
2019-11-21 10:13 ` Anson Huang
2019-11-21 10:53 ` Rafael J. Wysocki
2019-11-21 10:56 ` Rafael J. Wysocki
2019-11-22 5:15 ` Anson Huang
2019-11-22 9:59 ` Rafael J. Wysocki
2019-11-25 6:05 ` Anson Huang
2019-11-25 9:43 ` Anson Huang
2019-11-26 6:18 ` Viresh Kumar
2019-11-26 8:22 ` Anson Huang
2019-11-26 8:25 ` Viresh Kumar
2019-11-25 12:44 ` Rafael J. Wysocki
2019-11-26 8:57 ` Rafael J. Wysocki
2019-11-29 11:39 ` Rafael J. Wysocki
2019-11-29 13:44 ` Anson Huang
2019-12-05 8:53 ` Anson Huang
2019-12-05 10:48 ` Rafael J. Wysocki
2019-12-05 13:18 ` Anson Huang
2019-12-05 15:52 ` Rafael J. Wysocki
2019-12-09 10:31 ` Peng Fan
2019-12-09 10:37 ` Anson Huang
2019-12-09 10:56 ` Anson Huang
2019-12-09 11:23 ` Rafael J. Wysocki
2019-12-09 12:32 ` Anson Huang
2019-12-09 12:44 ` Rafael J. Wysocki
2019-12-09 14:18 ` Anson Huang
2019-12-10 5:39 ` Anson Huang
2019-12-10 5:53 ` Peng Fan
2019-12-10 7:05 ` Viresh Kumar
2019-12-10 8:22 ` Rafael J. Wysocki
2019-12-10 8:29 ` Anson Huang
2019-12-10 8:36 ` Viresh Kumar
2019-12-10 8:37 ` Peng Fan
2019-12-10 8:37 ` Rafael J. Wysocki
2019-12-10 8:43 ` Peng Fan
2019-12-10 8:45 ` Anson Huang
2019-12-10 8:50 ` Rafael J. Wysocki
2019-12-10 8:51 ` Anson Huang
2019-12-10 10:39 ` Rafael J. Wysocki
2019-12-10 10:54 ` Rafael J. Wysocki
2019-12-11 5:08 ` Anson Huang
2019-12-11 8:59 ` Peng Fan
2019-12-11 9:36 ` Rafael J. Wysocki
2019-12-11 9:43 ` Peng Fan
2019-12-11 9:52 ` Rafael J. Wysocki
2019-12-11 10:11 ` Peng Fan
2019-12-10 10:54 ` Viresh Kumar
2019-12-10 11:07 ` Rafael J. Wysocki
2019-12-10 8:57 ` Viresh Kumar
2019-12-10 11:03 ` Rafael J. Wysocki
2019-12-10 9:04 ` Rafael J. Wysocki
2019-12-10 8:31 ` Viresh Kumar
2019-12-10 8:12 ` Rafael J. Wysocki
2019-12-05 11:00 ` Viresh Kumar
2019-12-05 11:10 ` Rafael J. Wysocki
2019-12-05 11:17 ` Viresh Kumar
2019-11-21 10:37 ` Rafael J. Wysocki [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJZ5v0iqcU6A_tMNmxpcr3roEyw268fvM+tTUUg7fn9vALQwtg@mail.gmail.com \
--to=rafael@kernel.org \
--cc=anson.huang@nxp.com \
--cc=linux-pm@vger.kernel.org \
--cc=ping.bai@nxp.com \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).