Linux-PM Archive on lore.kernel.org
 help / color / Atom feed
* Re: About CPU hot-plug stress test failed in cpufreq driver
       [not found] <DB3PR0402MB391626A8ECFDC182C6EDCF8DF54E0@DB3PR0402MB3916.eurprd04.prod.outlook.com>
@ 2019-11-21  9:35 ` Viresh Kumar
  2019-11-21 10:13   ` Anson Huang
  2019-11-21 10:37   ` Rafael J. Wysocki
  0 siblings, 2 replies; 57+ messages in thread
From: Viresh Kumar @ 2019-11-21  9:35 UTC (permalink / raw)
  To: Anson Huang; +Cc: Jacky Bai, rafael, linux-pm

+Rafael and PM list.

Please provide output of following for your platform while I am having a look at
your problem.

grep . /sys/devices/system/cpu/cpufreq/*/*

On 21-11-19, 09:17, Anson Huang wrote:
> Hi, Viresh
> Sorry to bother you directly via mail.
> We met something wrong with cpufreq governor driver during CPU hot-plug stress
> test on v5.4-rc7, below is the log, from debug, looks like the irq_work is
> still pending on a CPU which is already offline, so next CPU being on/off line
> will call the cpufreq_stop_governor and it will wait for previous irq_work
> free for use, but since it is pending on an offline CPU, so it will never
> success. Do you have any idea of it or have any info about it? Thanks a lot in
> advanced!
> 
> [ 1062.437497] smp_test.sh     D    0   584    477 0x00000200
> [ 1062.442986] Call trace:
> [ 1062.445445]  __switch_to+0xb4/0x200
> [ 1062.448937]  __schedule+0x304/0x5b0
> [ 1062.452423]  schedule+0x40/0xd0
> [ 1062.455565]  schedule_timeout+0x16c/0x268
> [ 1062.459574]  wait_for_completion+0xa0/0x120
> [ 1062.463758]  __cpuhp_kick_ap+0x54/0x68
> [ 1062.467506]  cpuhp_kick_ap+0x38/0xa8
> [ 1062.471079]  bringup_cpu+0xbc/0xe8
> [ 1062.474480]  cpuhp_invoke_callback+0x88/0x1f8
> [ 1062.478835]  _cpu_up+0xe8/0x1e0
> [ 1062.481975]  do_cpu_up+0x98/0xb8
> [ 1062.485202]  cpu_up+0x10/0x18
> [ 1062.488172]  cpu_subsys_online+0x48/0xa0
> [ 1062.492094]  device_online+0x68/0xb0
> [ 1062.495667]  online_store+0xa8/0xb8
> [ 1062.499157]  dev_attr_store+0x14/0x28
> [ 1062.502820]  sysfs_kf_write+0x48/0x58
> [ 1062.506480]  kernfs_fop_write+0xe0/0x1f8
> [ 1062.510404]  __vfs_write+0x18/0x40
> [ 1062.513806]  vfs_write+0x19c/0x1f0
> [ 1062.517206]  ksys_write+0x64/0xf0
> [ 1062.520520]  __arm64_sys_write+0x14/0x20
> [ 1062.524446]  el0_svc_common.constprop.2+0xb0/0x168
> [ 1062.529236]  el0_svc_handler+0x20/0x80
> [ 1062.532984]  el0_svc+0x8/0xc
> [ 1062.535868] kworker/0:2     D    0  5496      2 0x00000228
> [ 1062.541361] Workqueue: events vmstat_shepherd
> [ 1062.545717] Call trace:
> [ 1062.548163]  __switch_to+0xb4/0x200
> [ 1062.551650]  __schedule+0x304/0x5b0
> [ 1062.555137]  schedule+0x40/0xd0
> [ 1062.558278]  rwsem_down_read_slowpath+0x200/0x4c0
> [ 1062.562984]  __down_read+0x9c/0xc0
> [ 1062.566385]  __percpu_down_read+0x6c/0xd8
> [ 1062.570393]  cpus_read_lock+0x70/0x78
> [ 1062.574054]  vmstat_shepherd+0x30/0xd0
> [ 1062.577804]  process_one_work+0x1dc/0x370
> [ 1062.581812]  worker_thread+0x48/0x468
> [ 1062.585475]  kthread+0xf0/0x120
> [ 1062.588615]  ret_from_fork+0x10/0x1c
> 
>   [ 1311.095934] sysrq: Show backtrace of all active CPUs
> [ 1311.100913] sysrq: CPU0:
> [ 1311.103450] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7-03224-g9b510305bd68-dirty #28
> [ 1311.111972] Hardware name: NXP i.MX8MNano DDR4 EVK board (DT)
> [ 1311.117717] pstate: 40000005 (nZcv daif -PAN -UAO)
> [ 1311.122519] pc : arch_cpu_idle+0x10/0x18
> [ 1311.126445] lr : default_idle_call+0x18/0x30
> [ 1311.130712] sp : ffff800011f73ec0
> [ 1311.134024] x29: ffff800011f73ec0 x28: 0000000041dd0018
> [ 1311.139335] x27: 0000000000000400 x26: 0000000000000000
> [ 1311.144646] x25: 0000000000000000 x24: ffff800011f7a220
> [ 1311.149956] x23: ffff800011f79000 x22: ffff800011b42bf8
> [ 1311.155267] x21: ffff800011f7a000 x20: 0000000000000001
> [ 1311.160578] x19: ffff800011f7a120 x18: 0000000000000000
> [ 1311.165888] x17: 0000000000000000 x16: 0000000000000000
> [ 1311.171198] x15: 0000000000000000 x14: 0000000000000000
> [ 1311.176508] x13: 0000000000000001 x12: 0000000000000000
> [ 1311.181819] x11: 0000000000000000 x10: 0000000000000990
> [ 1311.187129] x9 : ffff800011f73e10 x8 : ffff800011f83ef0
> [ 1311.192440] x7 : ffff000069b8f6c0 x6 : ffff000069b8ad70
> [ 1311.197750] x5 : 0000013165fb6700 x4 : 4000000000000000
> [ 1311.203061] x3 : ffff800011f73eb0 x2 : 0000000000000000
> [ 1311.208371] x1 : 0000000000095e94 x0 : 00000000000000e0
> [ 1311.213682] Call trace:
> [ 1311.216131]  arch_cpu_idle+0x10/0x18
> [ 1311.219708]  do_idle+0x1c4/0x290
> [ 1311.222935]  cpu_startup_entry+0x24/0x40
> [ 1311.226856]  rest_init+0xd4/0xe0
> [ 1311.230087]  arch_call_rest_init+0xc/0x14
> [ 1311.234094]  start_kernel+0x430/0x45c
> [ 1311.243556] sysrq: CPU1:
> [ 1311.246099] Call trace:
> [ 1311.248557]  dump_backtrace+0x0/0x158
> [ 1311.252220]  show_stack+0x14/0x20
> [ 1311.255537]  showacpu+0x70/0x80
> [ 1311.258682]  flush_smp_call_function_queue+0x74/0x150
> [ 1311.263733]  generic_smp_call_function_single_interrupt+0x10/0x18
> [ 1311.269831]  handle_IPI+0x138/0x168
> [ 1311.273319]  gic_handle_irq+0x144/0x148
> [ 1311.277153]  el1_irq+0xb8/0x180
> [ 1311.280298]  irq_work_sync+0x10/0x18
> [ 1311.283875]  cpufreq_stop_governor.part.20+0x1c/0x30
> [ 1311.288839]  cpufreq_online+0x5a0/0x860
> [ 1311.292673]  cpuhp_cpufreq_online+0xc/0x18
> [ 1311.296771]  cpuhp_invoke_callback+0x88/0x1f8
> [ 1311.301127]  cpuhp_thread_fun+0xd8/0x160
> [ 1311.305051]  smpboot_thread_fn+0x200/0x2a8
> [ 1311.309151]  kthread+0xf0/0x120
> [ 1311.312291]  ret_from_fork+0x10/0x1c
> 
> 
> Anson

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-21  9:35 ` About CPU hot-plug stress test failed in cpufreq driver Viresh Kumar
@ 2019-11-21 10:13   ` Anson Huang
  2019-11-21 10:53     ` Rafael J. Wysocki
  2019-11-21 10:37   ` Rafael J. Wysocki
  1 sibling, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-11-21 10:13 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Jacky Bai, rafael, linux-pm

Thanks Viresh for your quick response.
The output of cpufreq info are as below, some more info for you are, our internal tree is based on v5.4-rc7,
and the CPU hotplug has no i.MX platform code, so far we reproduced it on i.MX8QXP, i.MX8QM and i.MX8MN.
With cpufreq disabled, no issue met.
I also reproduced this issue with v5.4-rc7,
Will continue to debug and let you know if any new found.

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> +Rafael and PM list.
> 
> Please provide output of following for your platform while I am having a look
> at your problem.
> 
> grep . /sys/devices/system/cpu/cpufreq/*/*

root@imx8qxpmek:~# grep . /sys/devices/system/cpu/cpufreq/*/*
/sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load:0
/sys/devices/system/cpu/cpufreq/ondemand/io_is_busy:0
/sys/devices/system/cpu/cpufreq/ondemand/powersave_bias:0
/sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor:1
/sys/devices/system/cpu/cpufreq/ondemand/sampling_rate:10000
/sys/devices/system/cpu/cpufreq/ondemand/up_threshold:95
/sys/devices/system/cpu/cpufreq/policy0/affected_cpus:0 1 2 3
/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq:900000
/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_max_freq:1200000
/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_min_freq:900000
/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:150000
/sys/devices/system/cpu/cpufreq/policy0/related_cpus:0 1 2 3
/sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies:900000 1200000
/sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors:ondemand userspace performance schedutil
/sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:900000
/sys/devices/system/cpu/cpufreq/policy0/scaling_driver:cpufreq-dt
/sys/devices/system/cpu/cpufreq/policy0/scaling_governor:ondemand
/sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:1200000
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq:900000
/sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed:<unsupported>
grep: /sys/devices/system/cpu/cpufreq/policy0/stats: Is a directory


CPUHotplug: 4524 times remaining
[ 5954.441803] CPU1: shutdown
[ 5954.444529] psci: CPU1 killed.
[ 5954.481739] CPU2: shutdown
[ 5954.484484] psci: CPU2 killed.
[ 5954.530509] CPU3: shutdown
[ 5954.533270] psci: CPU3 killed.
[ 5955.561978] Detected VIPT I-cache on CPU1
[ 5955.562015] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
[ 5955.562073] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
[ 5955.596921] Detected VIPT I-cache on CPU2
[ 5955.596959] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
[ 5955.597018] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
[ 5955.645878] Detected VIPT I-cache on CPU3
[ 5955.645921] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
[ 5955.645986] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
CPUHotplug: 4523 times remaining
[ 5956.769790] CPU1: shutdown
[ 5956.772518] psci: CPU1 killed.
[ 5956.809752] CPU2: shutdown
[ 5956.812480] psci: CPU2 killed.
[ 5956.849769] CPU3: shutdown
[ 5956.852494] psci: CPU3 killed.
[ 5957.882045] Detected VIPT I-cache on CPU1
[ 5957.882089] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
[ 5957.882153] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]


Looping here, no hang, can response to debug console.... if attaching JTAG, I can see the CPU1
Will busy waiting for irq_work to be free..


Anson

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-21  9:35 ` About CPU hot-plug stress test failed in cpufreq driver Viresh Kumar
  2019-11-21 10:13   ` Anson Huang
@ 2019-11-21 10:37   ` Rafael J. Wysocki
  1 sibling, 0 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-11-21 10:37 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Anson Huang, Jacky Bai, Rafael J. Wysocki, Linux PM

On Thu, Nov 21, 2019 at 10:36 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> +Rafael and PM list.
>
> Please provide output of following for your platform while I am having a look at
> your problem.
>
> grep . /sys/devices/system/cpu/cpufreq/*/*
>
> On 21-11-19, 09:17, Anson Huang wrote:
> > Hi, Viresh
> > Sorry to bother you directly via mail.
> > We met something wrong with cpufreq governor driver during CPU hot-plug stress
> > test on v5.4-rc7, below is the log, from debug, looks like the irq_work is
> > still pending on a CPU which is already offline,

I'm really not sure if this conclusion can be drawn from the log below.

> > so next CPU being on/off line
> > will call the cpufreq_stop_governor and it will wait for previous irq_work
> > free for use, but since it is pending on an offline CPU, so it will never
> > success. Do you have any idea of it or have any info about it? Thanks a lot in
> > advanced!

So this is just blocked __cpuhp_kick_ap() waiting forever AFAICS.

> >
> > [ 1062.437497] smp_test.sh     D    0   584    477 0x00000200
> > [ 1062.442986] Call trace:
> > [ 1062.445445]  __switch_to+0xb4/0x200
> > [ 1062.448937]  __schedule+0x304/0x5b0
> > [ 1062.452423]  schedule+0x40/0xd0
> > [ 1062.455565]  schedule_timeout+0x16c/0x268
> > [ 1062.459574]  wait_for_completion+0xa0/0x120
> > [ 1062.463758]  __cpuhp_kick_ap+0x54/0x68
> > [ 1062.467506]  cpuhp_kick_ap+0x38/0xa8
> > [ 1062.471079]  bringup_cpu+0xbc/0xe8
> > [ 1062.474480]  cpuhp_invoke_callback+0x88/0x1f8
> > [ 1062.478835]  _cpu_up+0xe8/0x1e0
> > [ 1062.481975]  do_cpu_up+0x98/0xb8
> > [ 1062.485202]  cpu_up+0x10/0x18
> > [ 1062.488172]  cpu_subsys_online+0x48/0xa0
> > [ 1062.492094]  device_online+0x68/0xb0
> > [ 1062.495667]  online_store+0xa8/0xb8
> > [ 1062.499157]  dev_attr_store+0x14/0x28
> > [ 1062.502820]  sysfs_kf_write+0x48/0x58
> > [ 1062.506480]  kernfs_fop_write+0xe0/0x1f8
> > [ 1062.510404]  __vfs_write+0x18/0x40
> > [ 1062.513806]  vfs_write+0x19c/0x1f0
> > [ 1062.517206]  ksys_write+0x64/0xf0
> > [ 1062.520520]  __arm64_sys_write+0x14/0x20
> > [ 1062.524446]  el0_svc_common.constprop.2+0xb0/0x168
> > [ 1062.529236]  el0_svc_handler+0x20/0x80
> > [ 1062.532984]  el0_svc+0x8/0xc
> > [ 1062.535868] kworker/0:2     D    0  5496      2 0x00000228
> > [ 1062.541361] Workqueue: events vmstat_shepherd
> > [ 1062.545717] Call trace:
> > [ 1062.548163]  __switch_to+0xb4/0x200
> > [ 1062.551650]  __schedule+0x304/0x5b0
> > [ 1062.555137]  schedule+0x40/0xd0
> > [ 1062.558278]  rwsem_down_read_slowpath+0x200/0x4c0
> > [ 1062.562984]  __down_read+0x9c/0xc0
> > [ 1062.566385]  __percpu_down_read+0x6c/0xd8
> > [ 1062.570393]  cpus_read_lock+0x70/0x78
> > [ 1062.574054]  vmstat_shepherd+0x30/0xd0
> > [ 1062.577804]  process_one_work+0x1dc/0x370
> > [ 1062.581812]  worker_thread+0x48/0x468
> > [ 1062.585475]  kthread+0xf0/0x120
> > [ 1062.588615]  ret_from_fork+0x10/0x1c
> >
> >   [ 1311.095934] sysrq: Show backtrace of all active CPUs
> > [ 1311.100913] sysrq: CPU0:
> > [ 1311.103450] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc7-03224-g9b510305bd68-dirty #28
> > [ 1311.111972] Hardware name: NXP i.MX8MNano DDR4 EVK board (DT)
> > [ 1311.117717] pstate: 40000005 (nZcv daif -PAN -UAO)
> > [ 1311.122519] pc : arch_cpu_idle+0x10/0x18
> > [ 1311.126445] lr : default_idle_call+0x18/0x30
> > [ 1311.130712] sp : ffff800011f73ec0
> > [ 1311.134024] x29: ffff800011f73ec0 x28: 0000000041dd0018
> > [ 1311.139335] x27: 0000000000000400 x26: 0000000000000000
> > [ 1311.144646] x25: 0000000000000000 x24: ffff800011f7a220
> > [ 1311.149956] x23: ffff800011f79000 x22: ffff800011b42bf8
> > [ 1311.155267] x21: ffff800011f7a000 x20: 0000000000000001
> > [ 1311.160578] x19: ffff800011f7a120 x18: 0000000000000000
> > [ 1311.165888] x17: 0000000000000000 x16: 0000000000000000
> > [ 1311.171198] x15: 0000000000000000 x14: 0000000000000000
> > [ 1311.176508] x13: 0000000000000001 x12: 0000000000000000
> > [ 1311.181819] x11: 0000000000000000 x10: 0000000000000990
> > [ 1311.187129] x9 : ffff800011f73e10 x8 : ffff800011f83ef0
> > [ 1311.192440] x7 : ffff000069b8f6c0 x6 : ffff000069b8ad70
> > [ 1311.197750] x5 : 0000013165fb6700 x4 : 4000000000000000
> > [ 1311.203061] x3 : ffff800011f73eb0 x2 : 0000000000000000
> > [ 1311.208371] x1 : 0000000000095e94 x0 : 00000000000000e0
> > [ 1311.213682] Call trace:
> > [ 1311.216131]  arch_cpu_idle+0x10/0x18
> > [ 1311.219708]  do_idle+0x1c4/0x290
> > [ 1311.222935]  cpu_startup_entry+0x24/0x40
> > [ 1311.226856]  rest_init+0xd4/0xe0
> > [ 1311.230087]  arch_call_rest_init+0xc/0x14
> > [ 1311.234094]  start_kernel+0x430/0x45c
> > [ 1311.243556] sysrq: CPU1:
> > [ 1311.246099] Call trace:
> > [ 1311.248557]  dump_backtrace+0x0/0x158
> > [ 1311.252220]  show_stack+0x14/0x20
> > [ 1311.255537]  showacpu+0x70/0x80
> > [ 1311.258682]  flush_smp_call_function_queue+0x74/0x150
> > [ 1311.263733]  generic_smp_call_function_single_interrupt+0x10/0x18
> > [ 1311.269831]  handle_IPI+0x138/0x168
> > [ 1311.273319]  gic_handle_irq+0x144/0x148
> > [ 1311.277153]  el1_irq+0xb8/0x180
> > [ 1311.280298]  irq_work_sync+0x10/0x18
> > [ 1311.283875]  cpufreq_stop_governor.part.20+0x1c/0x30
> > [ 1311.288839]  cpufreq_online+0x5a0/0x860

And here you have cpufreq_online() calling cpufreq_stop_governor().

The only way that can happen is through cpufreq_add_policy_cpu()
AFAICS, because otherwise policy->governor would be NULL.

So cpufreq_stop_governor() is called from cpufreq_add_policy_cpu() and
it invokes the irq_work_sync() through the governor ->stop callback.
Now, the target irq_work can only be pending on an *online* CPU
sharing the policy with the one in question, so here the CPU going
online is waiting on the other CPU to run the irq_work, but for some
reason it cannot do that.

Note that this is after clearing the update_util pointer for all CPUs
sharing the policy and running synchronize_rcu(), so the irq_work must
have been queued earlier.

> > [ 1311.292673]  cpuhp_cpufreq_online+0xc/0x18
> > [ 1311.296771]  cpuhp_invoke_callback+0x88/0x1f8
> > [ 1311.301127]  cpuhp_thread_fun+0xd8/0x160
> > [ 1311.305051]  smpboot_thread_fn+0x200/0x2a8
> > [ 1311.309151]  kthread+0xf0/0x120
> > [ 1311.312291]  ret_from_fork+0x10/0x1c

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-21 10:13   ` Anson Huang
@ 2019-11-21 10:53     ` Rafael J. Wysocki
  2019-11-21 10:56       ` Rafael J. Wysocki
  0 siblings, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-11-21 10:53 UTC (permalink / raw)
  To: Anson Huang; +Cc: Viresh Kumar, Jacky Bai, rafael, linux-pm

On Thu, Nov 21, 2019 at 11:13 AM Anson Huang <anson.huang@nxp.com> wrote:
>
> Thanks Viresh for your quick response.
> The output of cpufreq info are as below, some more info for you are, our internal tree is based on v5.4-rc7,
> and the CPU hotplug has no i.MX platform code, so far we reproduced it on i.MX8QXP, i.MX8QM and i.MX8MN.
> With cpufreq disabled, no issue met.
> I also reproduced this issue with v5.4-rc7,
> Will continue to debug and let you know if any new found.
>
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> >
> > +Rafael and PM list.
> >
> > Please provide output of following for your platform while I am having a look
> > at your problem.
> >
> > grep . /sys/devices/system/cpu/cpufreq/*/*
>
> root@imx8qxpmek:~# grep . /sys/devices/system/cpu/cpufreq/*/*
> /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load:0
> /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy:0
> /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias:0
> /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor:1
> /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate:10000
> /sys/devices/system/cpu/cpufreq/ondemand/up_threshold:95
> /sys/devices/system/cpu/cpufreq/policy0/affected_cpus:0 1 2 3

All CPUs in one policy, CPU0 is the policy CPU and it never goes offline AFAICS.

> /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq:900000
> /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_max_freq:1200000
> /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_min_freq:900000
> /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:150000
> /sys/devices/system/cpu/cpufreq/policy0/related_cpus:0 1 2 3
> /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies:900000 1200000
> /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors:ondemand userspace performance schedutil
> /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:900000
> /sys/devices/system/cpu/cpufreq/policy0/scaling_driver:cpufreq-dt
> /sys/devices/system/cpu/cpufreq/policy0/scaling_governor:ondemand

Hm.  That shouldn't really make a difference, but I'm wondering if you
can reproduce this with the schedutil governor?

> /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:1200000
> /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq:900000
> /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed:<unsupported>
> grep: /sys/devices/system/cpu/cpufreq/policy0/stats: Is a directory
>
>
> CPUHotplug: 4524 times remaining
> [ 5954.441803] CPU1: shutdown
> [ 5954.444529] psci: CPU1 killed.
> [ 5954.481739] CPU2: shutdown
> [ 5954.484484] psci: CPU2 killed.
> [ 5954.530509] CPU3: shutdown
> [ 5954.533270] psci: CPU3 killed.
> [ 5955.561978] Detected VIPT I-cache on CPU1
> [ 5955.562015] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
> [ 5955.562073] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
> [ 5955.596921] Detected VIPT I-cache on CPU2
> [ 5955.596959] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
> [ 5955.597018] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
> [ 5955.645878] Detected VIPT I-cache on CPU3
> [ 5955.645921] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
> [ 5955.645986] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
> CPUHotplug: 4523 times remaining
> [ 5956.769790] CPU1: shutdown
> [ 5956.772518] psci: CPU1 killed.
> [ 5956.809752] CPU2: shutdown
> [ 5956.812480] psci: CPU2 killed.
> [ 5956.849769] CPU3: shutdown
> [ 5956.852494] psci: CPU3 killed.
> [ 5957.882045] Detected VIPT I-cache on CPU1
> [ 5957.882089] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
> [ 5957.882153] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
>
>
> Looping here, no hang, can response to debug console.... if attaching JTAG, I can see the CPU1
> Will busy waiting for irq_work to be free..

Well, cpufreq_offline() calls cpufreq_stop_governor() too, so there
shouldn't be any pending irq_works coming from cpufreq on the offline
CPUs after that.

Hence, if an irq_work is pending at the cpufreq_online() time, it must
be on CPU0 (which is always online).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-21 10:53     ` Rafael J. Wysocki
@ 2019-11-21 10:56       ` Rafael J. Wysocki
  2019-11-22  5:15         ` Anson Huang
  0 siblings, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-11-21 10:56 UTC (permalink / raw)
  To: Anson Huang; +Cc: Viresh Kumar, Jacky Bai, rafael, linux-pm

On Thu, Nov 21, 2019 at 11:53 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Nov 21, 2019 at 11:13 AM Anson Huang <anson.huang@nxp.com> wrote:
> >
> > Thanks Viresh for your quick response.
> > The output of cpufreq info are as below, some more info for you are, our internal tree is based on v5.4-rc7,
> > and the CPU hotplug has no i.MX platform code, so far we reproduced it on i.MX8QXP, i.MX8QM and i.MX8MN.
> > With cpufreq disabled, no issue met.
> > I also reproduced this issue with v5.4-rc7,
> > Will continue to debug and let you know if any new found.
> >
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > +Rafael and PM list.
> > >
> > > Please provide output of following for your platform while I am having a look
> > > at your problem.
> > >
> > > grep . /sys/devices/system/cpu/cpufreq/*/*
> >
> > root@imx8qxpmek:~# grep . /sys/devices/system/cpu/cpufreq/*/*
> > /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load:0
> > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy:0
> > /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias:0
> > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor:1
> > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate:10000
> > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold:95
> > /sys/devices/system/cpu/cpufreq/policy0/affected_cpus:0 1 2 3
>
> All CPUs in one policy, CPU0 is the policy CPU and it never goes offline AFAICS.
>
> > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq:900000
> > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_max_freq:1200000
> > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_min_freq:900000
> > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:150000
> > /sys/devices/system/cpu/cpufreq/policy0/related_cpus:0 1 2 3
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencies:900000 1200000
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors:ondemand userspace performance schedutil
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:900000
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_driver:cpufreq-dt
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor:ondemand
>
> Hm.  That shouldn't really make a difference, but I'm wondering if you
> can reproduce this with the schedutil governor?
>
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:1200000
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq:900000
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed:<unsupported>
> > grep: /sys/devices/system/cpu/cpufreq/policy0/stats: Is a directory
> >
> >
> > CPUHotplug: 4524 times remaining
> > [ 5954.441803] CPU1: shutdown
> > [ 5954.444529] psci: CPU1 killed.
> > [ 5954.481739] CPU2: shutdown
> > [ 5954.484484] psci: CPU2 killed.
> > [ 5954.530509] CPU3: shutdown
> > [ 5954.533270] psci: CPU3 killed.
> > [ 5955.561978] Detected VIPT I-cache on CPU1
> > [ 5955.562015] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
> > [ 5955.562073] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
> > [ 5955.596921] Detected VIPT I-cache on CPU2
> > [ 5955.596959] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
> > [ 5955.597018] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
> > [ 5955.645878] Detected VIPT I-cache on CPU3
> > [ 5955.645921] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
> > [ 5955.645986] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
> > CPUHotplug: 4523 times remaining
> > [ 5956.769790] CPU1: shutdown
> > [ 5956.772518] psci: CPU1 killed.
> > [ 5956.809752] CPU2: shutdown
> > [ 5956.812480] psci: CPU2 killed.
> > [ 5956.849769] CPU3: shutdown
> > [ 5956.852494] psci: CPU3 killed.
> > [ 5957.882045] Detected VIPT I-cache on CPU1
> > [ 5957.882089] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
> > [ 5957.882153] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
> >
> >
> > Looping here, no hang, can response to debug console.... if attaching JTAG, I can see the CPU1
> > Will busy waiting for irq_work to be free..
>
> Well, cpufreq_offline() calls cpufreq_stop_governor() too, so there
> shouldn't be any pending irq_works coming from cpufreq on the offline
> CPUs after that.
>
> Hence, if an irq_work is pending at the cpufreq_online() time, it must
> be on CPU0 (which is always online).

Let me rephrase this: If an irq_work is pending at the
cpufreq_online() time, it must be on an online CPU, which is CPU0 if
all of the other CPUs are offline.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-21 10:56       ` Rafael J. Wysocki
@ 2019-11-22  5:15         ` Anson Huang
  2019-11-22  9:59           ` Rafael J. Wysocki
  0 siblings, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-11-22  5:15 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, Jacky Bai, linux-pm

Hi, Rafael
	Theoretically, yes, the CPU being offline will run the irq work list to make sure the irq work pending on it will be clear, but the fact is NOT, both ondemand and schedutil governor can reproduce this issue if running stress CPU hotplug test.
	I tried add a "int cpu" in irq work structure to record CPU number which has irq work pending, when issue happen, I can see the irq work is pending at CPU #3 which is already offline, this is why issue happen, but I don't know how it happens...

diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index b11fcdf..f8da06f9 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -25,6 +25,7 @@ struct irq_work {
        unsigned long flags;
        struct llist_node llnode;
        void (*func)(struct irq_work *);
+       int cpu;
 };

 static inline
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index d42acaf..2e893d5 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -10,6 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <linux/irq_work.h>
+#include <linux/jiffies.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
 #include <linux/irqflags.h>
@@ -78,6 +79,7 @@ bool irq_work_queue(struct irq_work *work)
        if (!irq_work_claim(work))
                return false;

+       work->cpu = smp_processor_id();
        /* Queue the entry and raise the IPI if needed. */
        preempt_disable();
        __irq_work_queue_local(work);
@@ -105,6 +107,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
        /* Only queue if not already pending */
        if (!irq_work_claim(work))
                return false;
+       work->cpu = cpu;

        preempt_disable();
        if (cpu != smp_processor_id()) {
@@ -161,6 +164,7 @@ static void irq_work_run_list(struct llist_head *list)
                 */
                flags = work->flags & ~IRQ_WORK_PENDING;
                xchg(&work->flags, flags);
+               work->cpu = -1;

                work->func(work);
                /*
@@ -197,9 +201,13 @@ void irq_work_tick(void)
  */
 void irq_work_sync(struct irq_work *work)
 {
+       unsigned long timeout = jiffies + msecs_to_jiffies(500);
        lockdep_assert_irqs_enabled();

-       while (work->flags & IRQ_WORK_BUSY)
+       while (work->flags & IRQ_WORK_BUSY) {
+               if (time_after(jiffies, timeout))
+                       pr_warn("irq_work_sync 500ms timeout, work cpu %d\n", work->cpu);
                cpu_relax();
+       }
 }
 EXPORT_SYMBOL_GPL(irq_work_sync);


LOG:
87383 [  312.638229] Detected VIPT I-cache on CPU1
87384 [  312.638267] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
87385 [  312.638326] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
87386 [  312.673205] Detected VIPT I-cache on CPU2
87387 [  312.673243] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
87388 [  312.673303] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
87389 [  312.722140] Detected VIPT I-cache on CPU3
87390 [  312.722182] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
87391 [  312.722249] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
87392 CPUHotplug: 4877 times remaining
87393 [  313.854051] CPU1: shutdown
87394 [  313.856778] psci: CPU1 killed.
87395 [  313.894008] CPU2: shutdown
87396 [  313.896764] psci: CPU2 killed.
87397 [  313.934015] CPU3: shutdown
87398 [  313.936736] psci: CPU3 killed.
87399 [  314.970878] Detected VIPT I-cache on CPU1
87400 [  314.970921] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
87401 [  314.970987] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
87402 [  315.009201] Detected VIPT I-cache on CPU2
87403 [  315.009239] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
87404 [  315.009300] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
87405 [  315.058155] Detected VIPT I-cache on CPU3
87406 [  315.058199] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
87407 [  315.058266] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
87408 CPUHotplug: 4876 times remaining
87409 [  316.182053] CPU1: shutdown
87410 [  316.184776] psci: CPU1 killed.
87411 [  316.222002] CPU2: shutdown
87412 [  316.224729] psci: CPU2 killed.
87413 [  316.262011] CPU3: shutdown
87414 [  316.264734] psci: CPU3 killed.
87415 [  317.298143] Detected VIPT I-cache on CPU1
87416 [  317.298187] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
87417 [  317.298253] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
87418 [  317.833414] irq_work_sync 500ms timeout, work cpu 3
87419 [  317.838318] irq_work_sync 500ms timeout, work cpu 3
87420 [  317.843225] irq_work_sync 500ms timeout, work cpu 3
87421 [  317.848130] irq_work_sync 500ms timeout, work cpu 3
87422 [  317.853030] irq_work_sync 500ms timeout, work cpu 3
87423 [  317.857932] irq_work_sync 500ms timeout, work cpu 3
87424 [  317.862840] irq_work_sync 500ms timeout, work cpu 3



> On Thu, Nov 21, 2019 at 11:53 AM Rafael J. Wysocki <rafael@kernel.org>
> wrote:
> >
> > On Thu, Nov 21, 2019 at 11:13 AM Anson Huang <anson.huang@nxp.com>
> wrote:
> > >
> > > Thanks Viresh for your quick response.
> > > The output of cpufreq info are as below, some more info for you are,
> > > our internal tree is based on v5.4-rc7, and the CPU hotplug has no i.MX
> platform code, so far we reproduced it on i.MX8QXP, i.MX8QM and i.MX8MN.
> > > With cpufreq disabled, no issue met.
> > > I also reproduced this issue with v5.4-rc7, Will continue to debug
> > > and let you know if any new found.
> > >
> > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > driver
> > > >
> > > > +Rafael and PM list.
> > > >
> > > > Please provide output of following for your platform while I am
> > > > having a look at your problem.
> > > >
> > > > grep . /sys/devices/system/cpu/cpufreq/*/*
> > >
> > > root@imx8qxpmek:~# grep . /sys/devices/system/cpu/cpufreq/*/*
> > > /sys/devices/system/cpu/cpufreq/ondemand/ignore_nice_load:0
> > > /sys/devices/system/cpu/cpufreq/ondemand/io_is_busy:0
> > > /sys/devices/system/cpu/cpufreq/ondemand/powersave_bias:0
> > > /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor:1
> > > /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate:10000
> > > /sys/devices/system/cpu/cpufreq/ondemand/up_threshold:95
> > > /sys/devices/system/cpu/cpufreq/policy0/affected_cpus:0 1 2 3
> >
> > All CPUs in one policy, CPU0 is the policy CPU and it never goes offline
> AFAICS.
> >
> > > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_cur_freq:900000
> > > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_max_freq:1200000
> > > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_min_freq:900000
> > > /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_transition_latency:1
> > > 50000
> > > /sys/devices/system/cpu/cpufreq/policy0/related_cpus:0 1 2 3
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_available_frequencie
> > > s:900000 1200000
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_available_governors:
> > > ondemand userspace performance schedutil
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:900000
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_driver:cpufreq-dt
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor:ondemand
> >
> > Hm.  That shouldn't really make a difference, but I'm wondering if you
> > can reproduce this with the schedutil governor?
> >
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq:1200000
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq:900000
> > > /sys/devices/system/cpu/cpufreq/policy0/scaling_setspeed:<unsupporte
> > > d>
> > > grep: /sys/devices/system/cpu/cpufreq/policy0/stats: Is a directory
> > >
> > >
> > > CPUHotplug: 4524 times remaining
> > > [ 5954.441803] CPU1: shutdown
> > > [ 5954.444529] psci: CPU1 killed.
> > > [ 5954.481739] CPU2: shutdown
> > > [ 5954.484484] psci: CPU2 killed.
> > > [ 5954.530509] CPU3: shutdown
> > > [ 5954.533270] psci: CPU3 killed.
> > > [ 5955.561978] Detected VIPT I-cache on CPU1 [ 5955.562015] GICv3:
> > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [
> > > 5955.562073] CPU1: Booted secondary processor 0x0000000001
> > > [0x410fd042] [ 5955.596921] Detected VIPT I-cache on CPU2 [
> > > 5955.596959] GICv3: CPU2: found redistributor 2 region
> > > 0:0x0000000051b40000 [ 5955.597018] CPU2: Booted secondary
> processor
> > > 0x0000000002 [0x410fd042] [ 5955.645878] Detected VIPT I-cache on
> > > CPU3 [ 5955.645921] GICv3: CPU3: found redistributor 3 region
> > > 0:0x0000000051b60000 [ 5955.645986] CPU3: Booted secondary
> processor
> > > 0x0000000003 [0x410fd042]
> > > CPUHotplug: 4523 times remaining
> > > [ 5956.769790] CPU1: shutdown
> > > [ 5956.772518] psci: CPU1 killed.
> > > [ 5956.809752] CPU2: shutdown
> > > [ 5956.812480] psci: CPU2 killed.
> > > [ 5956.849769] CPU3: shutdown
> > > [ 5956.852494] psci: CPU3 killed.
> > > [ 5957.882045] Detected VIPT I-cache on CPU1 [ 5957.882089] GICv3:
> > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [
> > > 5957.882153] CPU1: Booted secondary processor 0x0000000001
> > > [0x410fd042]
> > >
> > >
> > > Looping here, no hang, can response to debug console.... if
> > > attaching JTAG, I can see the CPU1 Will busy waiting for irq_work to be
> free..
> >
> > Well, cpufreq_offline() calls cpufreq_stop_governor() too, so there
> > shouldn't be any pending irq_works coming from cpufreq on the offline
> > CPUs after that.
> >
> > Hence, if an irq_work is pending at the cpufreq_online() time, it must
> > be on CPU0 (which is always online).
> 
> Let me rephrase this: If an irq_work is pending at the
> cpufreq_online() time, it must be on an online CPU, which is CPU0 if all of the
> other CPUs are offline.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-22  5:15         ` Anson Huang
@ 2019-11-22  9:59           ` Rafael J. Wysocki
  2019-11-25  6:05             ` Anson Huang
  0 siblings, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-11-22  9:59 UTC (permalink / raw)
  To: Anson Huang; +Cc: Rafael J. Wysocki, Viresh Kumar, Jacky Bai, linux-pm

On Fri, Nov 22, 2019 at 6:15 AM Anson Huang <anson.huang@nxp.com> wrote:
>
> Hi, Rafael
>         Theoretically, yes, the CPU being offline will run the irq work list to make sure the irq work pending on it will be clear, but the fact is NOT,

So this looks like a problem with irq_work_sync() working not as expected.

>         both ondemand and schedutil governor can reproduce this issue if running stress CPU hotplug test.
>         I tried add a "int cpu" in irq work structure to record CPU number which has irq work pending, when issue happen, I can see the irq work is pending at CPU #3 which is already offline, this is why issue happen, but I don't know how it happens...
>
> diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> index b11fcdf..f8da06f9 100644
> --- a/include/linux/irq_work.h
> +++ b/include/linux/irq_work.h
> @@ -25,6 +25,7 @@ struct irq_work {
>         unsigned long flags;
>         struct llist_node llnode;
>         void (*func)(struct irq_work *);
> +       int cpu;
>  };
>
>  static inline
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index d42acaf..2e893d5 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -10,6 +10,7 @@
>  #include <linux/kernel.h>
>  #include <linux/export.h>
>  #include <linux/irq_work.h>
> +#include <linux/jiffies.h>
>  #include <linux/percpu.h>
>  #include <linux/hardirq.h>
>  #include <linux/irqflags.h>
> @@ -78,6 +79,7 @@ bool irq_work_queue(struct irq_work *work)
>         if (!irq_work_claim(work))
>                 return false;
>
> +       work->cpu = smp_processor_id();
>         /* Queue the entry and raise the IPI if needed. */
>         preempt_disable();
>         __irq_work_queue_local(work);
> @@ -105,6 +107,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
>         /* Only queue if not already pending */
>         if (!irq_work_claim(work))
>                 return false;
> +       work->cpu = cpu;
>
>         preempt_disable();
>         if (cpu != smp_processor_id()) {
> @@ -161,6 +164,7 @@ static void irq_work_run_list(struct llist_head *list)
>                  */
>                 flags = work->flags & ~IRQ_WORK_PENDING;
>                 xchg(&work->flags, flags);
> +               work->cpu = -1;
>
>                 work->func(work);
>                 /*
> @@ -197,9 +201,13 @@ void irq_work_tick(void)
>   */
>  void irq_work_sync(struct irq_work *work)
>  {
> +       unsigned long timeout = jiffies + msecs_to_jiffies(500);
>         lockdep_assert_irqs_enabled();

Can you please add something like

pr_info("%s: CPU %d\n", __func__, work->cpu);

here re-run the test and collect a log again?

I need to know if irq_work_sync() runs during CPU offline as expected.

>
> -       while (work->flags & IRQ_WORK_BUSY)
> +       while (work->flags & IRQ_WORK_BUSY) {
> +               if (time_after(jiffies, timeout))
> +                       pr_warn("irq_work_sync 500ms timeout, work cpu %d\n", work->cpu);
>                 cpu_relax();
> +       }
>  }
>  EXPORT_SYMBOL_GPL(irq_work_sync);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-22  9:59           ` Rafael J. Wysocki
@ 2019-11-25  6:05             ` Anson Huang
  2019-11-25  9:43               ` Anson Huang
  2019-11-25 12:44               ` Rafael J. Wysocki
  0 siblings, 2 replies; 57+ messages in thread
From: Anson Huang @ 2019-11-25  6:05 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, Jacky Bai, linux-pm

Hi, Rafael
	Looks like adding pr_info() in irq_work_sync() makes issue can NOT be reproduced, any possibility of race happen there and the pr_info eliminate the race condition? I will continue run the test with the pr_info to see if any luck to reproduce it.

Anson

> On Fri, Nov 22, 2019 at 6:15 AM Anson Huang <anson.huang@nxp.com>
> wrote:
> >
> > Hi, Rafael
> >         Theoretically, yes, the CPU being offline will run the irq
> > work list to make sure the irq work pending on it will be clear, but
> > the fact is NOT,
> 
> So this looks like a problem with irq_work_sync() working not as expected.
> 
> >         both ondemand and schedutil governor can reproduce this issue if
> running stress CPU hotplug test.
> >         I tried add a "int cpu" in irq work structure to record CPU number
> which has irq work pending, when issue happen, I can see the irq work is
> pending at CPU #3 which is already offline, this is why issue happen, but I
> don't know how it happens...
> >
> > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index
> > b11fcdf..f8da06f9 100644
> > --- a/include/linux/irq_work.h
> > +++ b/include/linux/irq_work.h
> > @@ -25,6 +25,7 @@ struct irq_work {
> >         unsigned long flags;
> >         struct llist_node llnode;
> >         void (*func)(struct irq_work *);
> > +       int cpu;
> >  };
> >
> >  static inline
> > diff --git a/kernel/irq_work.c b/kernel/irq_work.c index
> > d42acaf..2e893d5 100644
> > --- a/kernel/irq_work.c
> > +++ b/kernel/irq_work.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/kernel.h>
> >  #include <linux/export.h>
> >  #include <linux/irq_work.h>
> > +#include <linux/jiffies.h>
> >  #include <linux/percpu.h>
> >  #include <linux/hardirq.h>
> >  #include <linux/irqflags.h>
> > @@ -78,6 +79,7 @@ bool irq_work_queue(struct irq_work *work)
> >         if (!irq_work_claim(work))
> >                 return false;
> >
> > +       work->cpu = smp_processor_id();
> >         /* Queue the entry and raise the IPI if needed. */
> >         preempt_disable();
> >         __irq_work_queue_local(work);
> > @@ -105,6 +107,7 @@ bool irq_work_queue_on(struct irq_work *work,
> int cpu)
> >         /* Only queue if not already pending */
> >         if (!irq_work_claim(work))
> >                 return false;
> > +       work->cpu = cpu;
> >
> >         preempt_disable();
> >         if (cpu != smp_processor_id()) { @@ -161,6 +164,7 @@ static
> > void irq_work_run_list(struct llist_head *list)
> >                  */
> >                 flags = work->flags & ~IRQ_WORK_PENDING;
> >                 xchg(&work->flags, flags);
> > +               work->cpu = -1;
> >
> >                 work->func(work);
> >                 /*
> > @@ -197,9 +201,13 @@ void irq_work_tick(void)
> >   */
> >  void irq_work_sync(struct irq_work *work)  {
> > +       unsigned long timeout = jiffies + msecs_to_jiffies(500);
> >         lockdep_assert_irqs_enabled();
> 
> Can you please add something like
> 
> pr_info("%s: CPU %d\n", __func__, work->cpu);
> 
> here re-run the test and collect a log again?
> 
> I need to know if irq_work_sync() runs during CPU offline as expected.
> 
> >
> > -       while (work->flags & IRQ_WORK_BUSY)
> > +       while (work->flags & IRQ_WORK_BUSY) {
> > +               if (time_after(jiffies, timeout))
> > +                       pr_warn("irq_work_sync 500ms timeout, work cpu
> > + %d\n", work->cpu);
> >                 cpu_relax();
> > +       }
> >  }
> >  EXPORT_SYMBOL_GPL(irq_work_sync);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-25  6:05             ` Anson Huang
@ 2019-11-25  9:43               ` Anson Huang
  2019-11-26  6:18                 ` Viresh Kumar
  2019-11-25 12:44               ` Rafael J. Wysocki
  1 sibling, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-11-25  9:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, Jacky Bai, linux-pm

Hi, Rafael
	I tried to print the necessary info into DRAM instead of calling pr_info directly, then issue can be reproduced easily again, when issue happened, the LOG shows the last irq_work_sync() is NOT correctly finished. Below are the log and patch I added, the test case is simply to repeat removing CPU1/CPU2/CPU3 then adding back CPU1/CPU2/CPU3.
	When issue happens, below log shows last round of removing CPUs, CPU1/CPU2/CPU3 irq_work_sync() all worked as expected, the work->cpu is -1, then when CPU1 was added back, the irq_work flag is pending/busy on CPU1, and issue happened:

[  589.121091] timeout, i 2936, 1 -1 2 -1 3 -1 1 3

LOG:
CPUHotplug: 4758 times remaining
[  582.829724] CPU1: shutdown
[  582.832453] psci: CPU1 killed.
[  582.869673] CPU2: shutdown
[  582.872427] psci: CPU2 killed.
[  582.909674] CPU3: shutdown
[  582.912402] psci: CPU3 killed.
[  583.946543] Detected VIPT I-cache on CPU1
[  583.946586] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
[  583.946651] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
[  583.980754] Detected VIPT I-cache on CPU2
[  583.980791] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
[  583.980850] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
[  584.029700] Detected VIPT I-cache on CPU3
[  584.029741] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
[  584.029807] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
CPUHotplug: 4757 times remaining
[  585.141718] CPU1: shutdown
[  585.144439] psci: CPU1 killed.
[  585.193683] CPU2: shutdown
[  585.196408] psci: CPU2 killed.
[  585.241680] CPU3: shutdown
[  585.244406] psci: CPU3 killed.
[  586.273827] Detected VIPT I-cache on CPU1
[  586.273866] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
[  586.273926] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
[  586.308727] Detected VIPT I-cache on CPU2
[  586.308765] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
[  586.308826] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
[  586.357685] Detected VIPT I-cache on CPU3
[  586.357728] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
[  586.357791] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
CPUHotplug: 4756 times remaining
[  587.469734] CPU1: shutdown
[  587.472455] psci: CPU1 killed.
[  587.509680] CPU2: shutdown
[  587.512408] psci: CPU2 killed.
[  587.549685] CPU3: shutdown
[  587.552405] psci: CPU3 killed.
[  588.585770] Detected VIPT I-cache on CPU1
[  588.585814] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
[  588.585878] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
[  589.121091] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.125647] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.130203] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.134758] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.139315] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.143877] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.148431] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.152991] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.157550] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.162113] timeout, i 2936, 1 -1 2 -1 3 -1 1 3
[  589.166673] timeout, i 2936, 1 -1 2 -1 3 -1 1 3

Patch:
+++ b/kernel/irq_work.c
@@ -10,6 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <linux/irq_work.h>
+#include <linux/jiffies.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
 #include <linux/irqflags.h>
@@ -23,7 +24,7 @@

 static DEFINE_PER_CPU(struct llist_head, raised_list);
 static DEFINE_PER_CPU(struct llist_head, lazy_list);
-
+static int log[500 * 8 * 1024];
 /*
  * Claim the entry so that no one else will poke at it.
  */
@@ -78,6 +79,7 @@ bool irq_work_queue(struct irq_work *work)
        if (!irq_work_claim(work))
                return false;

+       work->cpu = smp_processor_id();
        /* Queue the entry and raise the IPI if needed. */
        preempt_disable();
        __irq_work_queue_local(work);
@@ -105,6 +107,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu)
        /* Only queue if not already pending */
        if (!irq_work_claim(work))
                return false;
+       work->cpu = cpu;

        preempt_disable();
        if (cpu != smp_processor_id()) {
@@ -161,6 +164,7 @@ static void irq_work_run_list(struct llist_head *list)
                 */
                flags = work->flags & ~IRQ_WORK_PENDING;
                xchg(&work->flags, flags);
+               work->cpu = -1;

                work->func(work);
                /*
@@ -197,9 +201,22 @@ void irq_work_tick(void)
  */
 void irq_work_sync(struct irq_work *work)
 {
+       unsigned long timeout = jiffies + msecs_to_jiffies(500);
+       static int i = 0;
+
        lockdep_assert_irqs_enabled();

-       while (work->flags & IRQ_WORK_BUSY)
+       log[i++] = smp_processor_id();
+       log[i++] = work->cpu;
+
+       while (work->flags & IRQ_WORK_BUSY) {
+               if (time_after(jiffies, timeout))
+                       pr_warn("timeout, i %d, %d %d %d %d %d %d %d %d\n",
+                               i, log[i - 8], log[i - 7], log[i - 6], log[i - 5],
+                               log[i - 4], log[i - 3], log[i - 2], log[i - 1]);
                cpu_relax();
+       }
 }
 EXPORT_SYMBOL_GPL(irq_work_sync);

Thanks,
Anson

> Subject: RE: About CPU hot-plug stress test failed in cpufreq driver
> 
> Hi, Rafael
> 	Looks like adding pr_info() in irq_work_sync() makes issue can NOT
> be reproduced, any possibility of race happen there and the pr_info
> eliminate the race condition? I will continue run the test with the pr_info to
> see if any luck to reproduce it.
> 
> Anson
> 
> > On Fri, Nov 22, 2019 at 6:15 AM Anson Huang <anson.huang@nxp.com>
> > wrote:
> > >
> > > Hi, Rafael
> > >         Theoretically, yes, the CPU being offline will run the irq
> > > work list to make sure the irq work pending on it will be clear, but
> > > the fact is NOT,
> >
> > So this looks like a problem with irq_work_sync() working not as expected.
> >
> > >         both ondemand and schedutil governor can reproduce this
> > > issue if
> > running stress CPU hotplug test.
> > >         I tried add a "int cpu" in irq work structure to record CPU
> > > number
> > which has irq work pending, when issue happen, I can see the irq work
> > is pending at CPU #3 which is already offline, this is why issue
> > happen, but I don't know how it happens...
> > >
> > > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
> > > index
> > > b11fcdf..f8da06f9 100644
> > > --- a/include/linux/irq_work.h
> > > +++ b/include/linux/irq_work.h
> > > @@ -25,6 +25,7 @@ struct irq_work {
> > >         unsigned long flags;
> > >         struct llist_node llnode;
> > >         void (*func)(struct irq_work *);
> > > +       int cpu;
> > >  };
> > >
> > >  static inline
> > > diff --git a/kernel/irq_work.c b/kernel/irq_work.c index
> > > d42acaf..2e893d5 100644
> > > --- a/kernel/irq_work.c
> > > +++ b/kernel/irq_work.c
> > > @@ -10,6 +10,7 @@
> > >  #include <linux/kernel.h>
> > >  #include <linux/export.h>
> > >  #include <linux/irq_work.h>
> > > +#include <linux/jiffies.h>
> > >  #include <linux/percpu.h>
> > >  #include <linux/hardirq.h>
> > >  #include <linux/irqflags.h>
> > > @@ -78,6 +79,7 @@ bool irq_work_queue(struct irq_work *work)
> > >         if (!irq_work_claim(work))
> > >                 return false;
> > >
> > > +       work->cpu = smp_processor_id();
> > >         /* Queue the entry and raise the IPI if needed. */
> > >         preempt_disable();
> > >         __irq_work_queue_local(work); @@ -105,6 +107,7 @@ bool
> > > irq_work_queue_on(struct irq_work *work,
> > int cpu)
> > >         /* Only queue if not already pending */
> > >         if (!irq_work_claim(work))
> > >                 return false;
> > > +       work->cpu = cpu;
> > >
> > >         preempt_disable();
> > >         if (cpu != smp_processor_id()) { @@ -161,6 +164,7 @@ static
> > > void irq_work_run_list(struct llist_head *list)
> > >                  */
> > >                 flags = work->flags & ~IRQ_WORK_PENDING;
> > >                 xchg(&work->flags, flags);
> > > +               work->cpu = -1;
> > >
> > >                 work->func(work);
> > >                 /*
> > > @@ -197,9 +201,13 @@ void irq_work_tick(void)
> > >   */
> > >  void irq_work_sync(struct irq_work *work)  {
> > > +       unsigned long timeout = jiffies + msecs_to_jiffies(500);
> > >         lockdep_assert_irqs_enabled();
> >
> > Can you please add something like
> >
> > pr_info("%s: CPU %d\n", __func__, work->cpu);
> >
> > here re-run the test and collect a log again?
> >
> > I need to know if irq_work_sync() runs during CPU offline as expected.
> >
> > >
> > > -       while (work->flags & IRQ_WORK_BUSY)
> > > +       while (work->flags & IRQ_WORK_BUSY) {
> > > +               if (time_after(jiffies, timeout))
> > > +                       pr_warn("irq_work_sync 500ms timeout, work
> > > + cpu %d\n", work->cpu);
> > >                 cpu_relax();
> > > +       }
> > >  }
> > >  EXPORT_SYMBOL_GPL(irq_work_sync);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-25  6:05             ` Anson Huang
  2019-11-25  9:43               ` Anson Huang
@ 2019-11-25 12:44               ` Rafael J. Wysocki
  2019-11-26  8:57                 ` Rafael J. Wysocki
  2019-11-29 11:39                 ` Rafael J. Wysocki
  1 sibling, 2 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-11-25 12:44 UTC (permalink / raw)
  To: Anson Huang; +Cc: Rafael J. Wysocki, Viresh Kumar, Jacky Bai, linux-pm

On Mon, Nov 25, 2019 at 7:05 AM Anson Huang <anson.huang@nxp.com> wrote:
>
> Hi, Rafael
>         Looks like adding pr_info() in irq_work_sync() makes issue can NOT be reproduced, any possibility of race happen there and the pr_info eliminate the race condition? I will continue run the test with the pr_info to see if any luck to reproduce it.

Yes, it looks like there is a race condition in there.

I need to analyze the code a bit to confirm it which may take a bit of time.

Cheers!


> > On Fri, Nov 22, 2019 at 6:15 AM Anson Huang <anson.huang@nxp.com>
> > wrote:
> > >
> > > Hi, Rafael
> > >         Theoretically, yes, the CPU being offline will run the irq
> > > work list to make sure the irq work pending on it will be clear, but
> > > the fact is NOT,
> >
> > So this looks like a problem with irq_work_sync() working not as expected.
> >
> > >         both ondemand and schedutil governor can reproduce this issue if
> > running stress CPU hotplug test.
> > >         I tried add a "int cpu" in irq work structure to record CPU number
> > which has irq work pending, when issue happen, I can see the irq work is
> > pending at CPU #3 which is already offline, this is why issue happen, but I
> > don't know how it happens...
> > >
> > > diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index
> > > b11fcdf..f8da06f9 100644
> > > --- a/include/linux/irq_work.h
> > > +++ b/include/linux/irq_work.h
> > > @@ -25,6 +25,7 @@ struct irq_work {
> > >         unsigned long flags;
> > >         struct llist_node llnode;
> > >         void (*func)(struct irq_work *);
> > > +       int cpu;
> > >  };
> > >
> > >  static inline
> > > diff --git a/kernel/irq_work.c b/kernel/irq_work.c index
> > > d42acaf..2e893d5 100644
> > > --- a/kernel/irq_work.c
> > > +++ b/kernel/irq_work.c
> > > @@ -10,6 +10,7 @@
> > >  #include <linux/kernel.h>
> > >  #include <linux/export.h>
> > >  #include <linux/irq_work.h>
> > > +#include <linux/jiffies.h>
> > >  #include <linux/percpu.h>
> > >  #include <linux/hardirq.h>
> > >  #include <linux/irqflags.h>
> > > @@ -78,6 +79,7 @@ bool irq_work_queue(struct irq_work *work)
> > >         if (!irq_work_claim(work))
> > >                 return false;
> > >
> > > +       work->cpu = smp_processor_id();
> > >         /* Queue the entry and raise the IPI if needed. */
> > >         preempt_disable();
> > >         __irq_work_queue_local(work);
> > > @@ -105,6 +107,7 @@ bool irq_work_queue_on(struct irq_work *work,
> > int cpu)
> > >         /* Only queue if not already pending */
> > >         if (!irq_work_claim(work))
> > >                 return false;
> > > +       work->cpu = cpu;
> > >
> > >         preempt_disable();
> > >         if (cpu != smp_processor_id()) { @@ -161,6 +164,7 @@ static
> > > void irq_work_run_list(struct llist_head *list)
> > >                  */
> > >                 flags = work->flags & ~IRQ_WORK_PENDING;
> > >                 xchg(&work->flags, flags);
> > > +               work->cpu = -1;
> > >
> > >                 work->func(work);
> > >                 /*
> > > @@ -197,9 +201,13 @@ void irq_work_tick(void)
> > >   */
> > >  void irq_work_sync(struct irq_work *work)  {
> > > +       unsigned long timeout = jiffies + msecs_to_jiffies(500);
> > >         lockdep_assert_irqs_enabled();
> >
> > Can you please add something like
> >
> > pr_info("%s: CPU %d\n", __func__, work->cpu);
> >
> > here re-run the test and collect a log again?
> >
> > I need to know if irq_work_sync() runs during CPU offline as expected.
> >
> > >
> > > -       while (work->flags & IRQ_WORK_BUSY)
> > > +       while (work->flags & IRQ_WORK_BUSY) {
> > > +               if (time_after(jiffies, timeout))
> > > +                       pr_warn("irq_work_sync 500ms timeout, work cpu
> > > + %d\n", work->cpu);
> > >                 cpu_relax();
> > > +       }
> > >  }
> > >  EXPORT_SYMBOL_GPL(irq_work_sync);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-25  9:43               ` Anson Huang
@ 2019-11-26  6:18                 ` Viresh Kumar
  2019-11-26  8:22                   ` Anson Huang
  0 siblings, 1 reply; 57+ messages in thread
From: Viresh Kumar @ 2019-11-26  6:18 UTC (permalink / raw)
  To: Anson Huang; +Cc: Rafael J. Wysocki, Jacky Bai, linux-pm

On 25-11-19, 09:43, Anson Huang wrote:
> Hi, Rafael
> 	I tried to print the necessary info into DRAM instead of calling pr_info
> 	directly, then issue can be reproduced easily again, when issue
> 	happened, the LOG shows the last irq_work_sync() is NOT correctly
> 	finished. Below are the log and patch I added, the test case is simply
> 	to repeat removing CPU1/CPU2/CPU3 then adding back CPU1/CPU2/CPU3.
> 	When issue happens, below log shows last round of removing CPUs,
> 	CPU1/CPU2/CPU3 irq_work_sync() all worked as expected, the work->cpu is
> 	-1, then when CPU1 was added back, the irq_work flag is pending/busy on
> 	CPU1, and issue happened:

FWIW, I tried to reproduce it on my hikey board and I couldn't even after hours
of testing :(

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-26  6:18                 ` Viresh Kumar
@ 2019-11-26  8:22                   ` Anson Huang
  2019-11-26  8:25                     ` Viresh Kumar
  0 siblings, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-11-26  8:22 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Rafael J. Wysocki, Jacky Bai, linux-pm

Hi, Viresh

> On 25-11-19, 09:43, Anson Huang wrote:
> > Hi, Rafael
> > 	I tried to print the necessary info into DRAM instead of calling pr_info
> > 	directly, then issue can be reproduced easily again, when issue
> > 	happened, the LOG shows the last irq_work_sync() is NOT correctly
> > 	finished. Below are the log and patch I added, the test case is simply
> > 	to repeat removing CPU1/CPU2/CPU3 then adding back
> CPU1/CPU2/CPU3.
> > 	When issue happens, below log shows last round of removing CPUs,
> > 	CPU1/CPU2/CPU3 irq_work_sync() all worked as expected, the work-
> >cpu is
> > 	-1, then when CPU1 was added back, the irq_work flag is
> pending/busy on
> > 	CPU1, and issue happened:
> 
> FWIW, I tried to reproduce it on my hikey board and I couldn't even after
> hours of testing :(

Did you use ondemand governor? By default the governor is performance and
it has no issue.

Thanks,
Anson

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-26  8:22                   ` Anson Huang
@ 2019-11-26  8:25                     ` Viresh Kumar
  0 siblings, 0 replies; 57+ messages in thread
From: Viresh Kumar @ 2019-11-26  8:25 UTC (permalink / raw)
  To: Anson Huang; +Cc: Rafael J. Wysocki, Jacky Bai, linux-pm

On 26-11-19, 08:22, Anson Huang wrote:
> Hi, Viresh
> 
> > On 25-11-19, 09:43, Anson Huang wrote:
> > > Hi, Rafael
> > > 	I tried to print the necessary info into DRAM instead of calling pr_info
> > > 	directly, then issue can be reproduced easily again, when issue
> > > 	happened, the LOG shows the last irq_work_sync() is NOT correctly
> > > 	finished. Below are the log and patch I added, the test case is simply
> > > 	to repeat removing CPU1/CPU2/CPU3 then adding back
> > CPU1/CPU2/CPU3.
> > > 	When issue happens, below log shows last round of removing CPUs,
> > > 	CPU1/CPU2/CPU3 irq_work_sync() all worked as expected, the work-
> > >cpu is
> > > 	-1, then when CPU1 was added back, the irq_work flag is
> > pending/busy on
> > > 	CPU1, and issue happened:
> > 
> > FWIW, I tried to reproduce it on my hikey board and I couldn't even after
> > hours of testing :(
> 
> Did you use ondemand governor? By default the governor is performance and
> it has no issue.

Yes, governor was set to ondemand and then I ran this script, which permanently
removes CPU 4-7 and then keeps offlining/onlining CPU 1-3:

root@linaro-developer:~/work# cat irqwork.sh 
#!/bin/bash

cd /sys/devices/system/cpu/

for i in `seq 4 7`
do
        echo 0 > cpu$i/online
done

while true
do
        for i in `seq 1 3`
        do
                echo 0 > cpu$i/online
        done

        for i in `seq 1 3`
        do
                echo 1 > cpu$i/online
        done
done


-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-25 12:44               ` Rafael J. Wysocki
@ 2019-11-26  8:57                 ` Rafael J. Wysocki
  2019-11-29 11:39                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-11-26  8:57 UTC (permalink / raw)
  To: Anson Huang; +Cc: Viresh Kumar, Jacky Bai, linux-pm

On Mon, Nov 25, 2019 at 1:44 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Nov 25, 2019 at 7:05 AM Anson Huang <anson.huang@nxp.com> wrote:
> >
> > Hi, Rafael
> >         Looks like adding pr_info() in irq_work_sync() makes issue can NOT be reproduced, any possibility of race happen there and the pr_info eliminate the race condition? I will continue run the test with the pr_info to see if any luck to reproduce it.
>
> Yes, it looks like there is a race condition in there.

I'm also thinking about a nasty enough compiler optimization in irq_work_sync().

Do you use LLVM by any chance?  If so, can you put a READ_ONCE()
around the work->flags read in there and see if that makes any
difference?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-25 12:44               ` Rafael J. Wysocki
  2019-11-26  8:57                 ` Rafael J. Wysocki
@ 2019-11-29 11:39                 ` Rafael J. Wysocki
  2019-11-29 13:44                   ` Anson Huang
  1 sibling, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-11-29 11:39 UTC (permalink / raw)
  To: Anson Huang; +Cc: Viresh Kumar, Jacky Bai, linux-pm

On Monday, November 25, 2019 1:44:20 PM CET Rafael J. Wysocki wrote:
> On Mon, Nov 25, 2019 at 7:05 AM Anson Huang <anson.huang@nxp.com> wrote:
> >
> > Hi, Rafael
> >         Looks like adding pr_info() in irq_work_sync() makes issue can NOT be reproduced, any possibility of race happen there and the pr_info eliminate the race condition? I will continue run the test with the pr_info to see if any luck to reproduce it.
> 
> Yes, it looks like there is a race condition in there.
> 
> I need to analyze the code a bit to confirm it which may take a bit of time.

I'm not seeing any races in there expect for the possible over-optimization
of irq_work_sync() that I was talking about before.

Cheers!




^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-29 11:39                 ` Rafael J. Wysocki
@ 2019-11-29 13:44                   ` Anson Huang
  2019-12-05  8:53                     ` Anson Huang
  0 siblings, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-11-29 13:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, Jacky Bai, linux-pm



> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Monday, November 25, 2019 1:44:20 PM CET Rafael J. Wysocki wrote:
> > On Mon, Nov 25, 2019 at 7:05 AM Anson Huang <anson.huang@nxp.com>
> wrote:
> > >
> > > Hi, Rafael
> > >         Looks like adding pr_info() in irq_work_sync() makes issue can NOT
> be reproduced, any possibility of race happen there and the pr_info
> eliminate the race condition? I will continue run the test with the pr_info to
> see if any luck to reproduce it.
> >
> > Yes, it looks like there is a race condition in there.
> >
> > I need to analyze the code a bit to confirm it which may take a bit of time.
> 
> I'm not seeing any races in there expect for the possible over-optimization of
> irq_work_sync() that I was talking about before.
> 
> Cheers!

Thanks,
Then I have to debug it myself, many i.MX platforms CPU hot-plug are broken
suddenly, while old kernel versions (4.19) are running just fine. 

Anson


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-11-29 13:44                   ` Anson Huang
@ 2019-12-05  8:53                     ` Anson Huang
  2019-12-05 10:48                       ` Rafael J. Wysocki
  2019-12-05 11:00                       ` Viresh Kumar
  0 siblings, 2 replies; 57+ messages in thread
From: Anson Huang @ 2019-12-05  8:53 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Viresh Kumar, Jacky Bai, linux-pm

Hi, Rafael
	This issue is very weird, the irq_work used in cpufreq_governor.c is very simple, ONLY one entry to claim the irq_work, and cpufreq_governor's irq_work is a private irq_work structure, no other drivers use it. I added some trace event in cpufreq_governor.c and irq_work.c, every time, the issue happened at the point of CPU1/2/3 all off, and CPU1 start ON line, but when CPU1 tried to sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work shows that previous work is pending on CPU3 which is offline, I also had the trace event in irq_work_claim(),  but no any log shows the cpufreq_governor irq_work is claimed on CPU3 after CPU3 offline, below is the debug patch I added and the log on 2 consoles:
	If I understand it correctly, the irq work used in cpufreq_governor ONLY has one entry of calling irq_work_queue() which will be ONLY claimed on the CPU calling the irq_work_queue(), but from trace result, I have NOT see where CPU3 could call irq_work_queue() after it finishes the irq work sync before offline. Could it something wrong related to cache maintain during CPU hotplug?

PATCH:
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 4bb054d..90d86dd 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -16,6 +16,7 @@
 #include <linux/export.h>
 #include <linux/kernel_stat.h>
 #include <linux/slab.h>
+#include <trace/events/power.h>

 #include "cpufreq_governor.h"

@@ -316,6 +317,8 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,

        policy_dbs->last_sample_time = time;
        policy_dbs->work_in_progress = true;
+       trace_cpu_frequency_irq_work(smp_processor_id(),
+               policy_dbs->irq_work.cpu);
        irq_work_queue(&policy_dbs->irq_work);
 }

@@ -501,7 +504,6 @@ void cpufreq_dbs_governor_exit(struct cpufreq_policy *policy)
        mutex_unlock(&gov_dbs_data_mutex);
 }
 EXPORT_SYMBOL_GPL(cpufreq_dbs_governor_exit);
-
 int cpufreq_dbs_governor_start(struct cpufreq_policy *policy)
 {
        struct dbs_governor *gov = dbs_governor_of(policy);
@@ -522,7 +524,6 @@ int cpufreq_dbs_governor_start(struct cpufreq_policy *policy)

        for_each_cpu(j, policy->cpus) {
                struct cpu_dbs_info *j_cdbs = &per_cpu(cpu_dbs, j);
-
                j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_update_time, io_busy);
                /*
                 * Make the first invocation of dbs_update() compute the load.
@@ -545,6 +546,8 @@ void cpufreq_dbs_governor_stop(struct cpufreq_policy *policy)
        struct policy_dbs_info *policy_dbs = policy->governor_data;

        gov_clear_update_util(policy_dbs->policy);
+       trace_cpu_frequency_irq_work_sync(smp_processor_id(),
+               policy_dbs->irq_work.cpu, policy_dbs->irq_work.flags);
        irq_work_sync(&policy_dbs->irq_work);
        cancel_work_sync(&policy_dbs->work);
        atomic_set(&policy_dbs->work_count, 0);
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index b11fcdf..f8da06f9 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -25,6 +25,7 @@ struct irq_work {
        unsigned long flags;
        struct llist_node llnode;
        void (*func)(struct irq_work *);
+       int cpu;
 };

 static inline
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 7457e23..89e801b 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -173,6 +173,93 @@ TRACE_EVENT(cpu_frequency_limits,
                  (unsigned long)__entry->cpu_id)
 );

+TRACE_EVENT(cpu_frequency_irq_work,
+
+       TP_PROTO(int cpu_id, int cpu),
+
+       TP_ARGS(cpu_id, cpu),
+
+       TP_STRUCT__entry(
+               __field(int, cpu_id)
+               __field(int, cpu)
+       ),
+
+       TP_fast_assign(
+               __entry->cpu_id = cpu_id;
+               __entry->cpu = cpu;
+       ),
+
+       TP_printk("cpu_id=%d, cpu=%d",
+                 __entry->cpu_id,
+                 __entry->cpu)
+);
+
+TRACE_EVENT(cpu_frequency_irq_work_sync,
+
+       TP_PROTO(int cpu_id, int cpu, u32 flag),
+
+       TP_ARGS(cpu_id, cpu, flag),
+
+       TP_STRUCT__entry(
+               __field(int, cpu_id)
+               __field(int, cpu)
+               __field(u32, flag)
+       ),
+
+       TP_fast_assign(
+               __entry->cpu_id = cpu_id;
+               __entry->cpu = cpu;
+               __entry->flag = flag;
+       ),
+
+       TP_printk("cpu_id=%d, cpu=%d, flag=%d",
+                 __entry->cpu_id,
+                 __entry->cpu,
+                 (unsigned long)__entry->flag)
+);
+
+TRACE_EVENT(cpu_frequency_irq_run_list,
+
+       TP_PROTO(int cpu_id, u32 flag),
+
+       TP_ARGS(cpu_id, flag),
+
+       TP_STRUCT__entry(
+               __field(u32, cpu_id)
+               __field(u32, flag)
+       ),
+
+       TP_fast_assign(
+               __entry->cpu_id = cpu_id;
+               __entry->flag = flag;
+       ),
+
+       TP_printk("cpu_id=%lu, flag=%lu",
+                 (unsigned long)__entry->cpu_id,
+                 (unsigned long)__entry->flag)
+);
+
+TRACE_EVENT(cpu_frequency_irq_claim,
+
+       TP_PROTO(int cpu_id, u32 flag),
+
+       TP_ARGS(cpu_id, flag),
+
+       TP_STRUCT__entry(
+               __field(u32, cpu_id)
+               __field(u32, flag)
+       ),
+
+       TP_fast_assign(
+               __entry->cpu_id = cpu_id;
+               __entry->flag = flag;
+       ),
+
+       TP_printk("cpu_id=%lu, flag=%lu",
+                 (unsigned long)__entry->cpu_id,
+                 (unsigned long)__entry->flag)
+);
+
 TRACE_EVENT(device_pm_callback_start,

        TP_PROTO(struct device *dev, const char *pm_ops, int event),
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index d42acaf..0c13608 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -19,6 +19,7 @@
 #include <linux/notifier.h>
 #include <linux/smp.h>
 #include <asm/processor.h>
+#include <trace/events/power.h>


 static DEFINE_PER_CPU(struct llist_head, raised_list);
@@ -39,13 +40,20 @@ static bool irq_work_claim(struct irq_work *work)
        for (;;) {
                nflags = flags | IRQ_WORK_CLAIMED;
                oflags = cmpxchg(&work->flags, flags, nflags);
                if (oflags == flags)
                        break;
-               if (oflags & IRQ_WORK_PENDING)
+               if (oflags & IRQ_WORK_PENDING) {
+                       work->cpu = smp_processor_id();
+                       trace_cpu_frequency_irq_claim(smp_processor_id(),
+                               work->flags | 0xf0000000);
                        return false;
+               }
                flags = oflags;
                cpu_relax();
        }
+       work->cpu = smp_processor_id();
+       trace_cpu_frequency_irq_claim(smp_processor_id(), work->flags);

        return true;
 }
@@ -159,6 +167,7 @@ static void irq_work_run_list(struct llist_head *list)
                 * to claim that work don't rely on us to handle their data
                 * while we are in the middle of the func.
                 */
+               work->cpu = -1;
                flags = work->flags & ~IRQ_WORK_PENDING;
                xchg(&work->flags, flags);

@@ -168,6 +177,9 @@ static void irq_work_run_list(struct llist_head *list)
                 * no-one else claimed it meanwhile.
                 */
                (void)cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
+               trace_cpu_frequency_irq_run_list(smp_processor_id(),
+                       work->flags);
        }
 }


LOG on console 1 which does CPU1/2/3 offline and online stress test:
CPUHotplug: 4575 times remaining
[ 1047.401185] CPU1: shutdown
[ 1047.403917] psci: CPU1 killed.
[ 1047.449153] CPU2: shutdown
[ 1047.451880] psci: CPU2 killed.
[ 1047.501131] CPU3: shutdown
[ 1047.503857] psci: CPU3 killed.
[ 1048.541939] Detected VIPT I-cache on CPU1
[ 1048.541983] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
[ 1048.542050] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
[ 1048.585024] Detected VIPT I-cache on CPU2
[ 1048.585061] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
[ 1048.585121] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
[ 1048.645070] Detected VIPT I-cache on CPU3
[ 1048.645112] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
[ 1048.645181] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
CPUHotplug: 4574 times remaining
[ 1049.769187] CPU1: shutdown
[ 1049.771913] psci: CPU1 killed.
[ 1049.809126] CPU2: shutdown
[ 1049.811856] psci: CPU2 killed.
[ 1049.853135] CPU3: shutdown
[ 1049.855868] psci: CPU3 killed.

Waiting here forever.....

LOG on console 2 which enables the trace events I added upper:
             sed-4591  [003] d..4  1049.705561: cpu_frequency_irq_claim: cpu_id=3, flag=3
             sed-4591  [003] dNh1  1049.705604: cpu_frequency_irq_run_list: cpu_id=3, flag=0
          <idle>-0     [001] d.s2  1049.716308: cpu_frequency_irq_work: cpu_id=1, cpu=-1
          <idle>-0     [001] d.s2  1049.716319: cpu_frequency_irq_claim: cpu_id=1, flag=3
          <idle>-0     [001] dNH2  1049.716338: cpu_frequency_irq_run_list: cpu_id=1, flag=0
          <idle>-0     [002] d.s2  1049.728303: cpu_frequency_irq_work: cpu_id=2, cpu=-1
          <idle>-0     [002] d.s2  1049.728307: cpu_frequency_irq_claim: cpu_id=2, flag=3
          <idle>-0     [002] dNH2  1049.728320: cpu_frequency_irq_run_list: cpu_id=2, flag=0
          <idle>-0     [001] d.s2  1049.740305: cpu_frequency_irq_work: cpu_id=1, cpu=-1
          <idle>-0     [001] d.s2  1049.740307: cpu_frequency_irq_claim: cpu_id=1, flag=3
          <idle>-0     [001] dNH2  1049.740319: cpu_frequency_irq_run_list: cpu_id=1, flag=0
          <idle>-0     [001] d.s2  1049.752305: cpu_frequency_irq_work: cpu_id=1, cpu=-1
          <idle>-0     [001] d.s2  1049.752307: cpu_frequency_irq_claim: cpu_id=1, flag=3
          <idle>-0     [001] dNH2  1049.752316: cpu_frequency_irq_run_list: cpu_id=1, flag=0
         cpuhp/1-13    [001] ....  1049.768340: cpu_frequency_irq_work_sync: cpu_id=1, cpu=-1, flag=0
         cpuhp/1-13    [001] d..4  1049.768681: cpu_frequency_irq_work: cpu_id=1, cpu=-1
         cpuhp/1-13    [001] d..4  1049.768683: cpu_frequency_irq_claim: cpu_id=1, flag=3
         cpuhp/1-13    [001] dNh1  1049.768698: cpu_frequency_irq_run_list: cpu_id=1, flag=0
     smp_test.sh-734   [000] ...1  1049.771903: cpu_frequency_irq_claim: cpu_id=0, flag=7
     smp_test.sh-734   [000] dNh1  1049.775009: cpu_frequency_irq_run_list: cpu_id=0, flag=4
     smp_test.sh-734   [000] ...1  1049.776084: cpu_frequency_irq_claim: cpu_id=0, flag=7
     smp_test.sh-734   [000] dNh.  1049.776392: cpu_frequency_irq_run_list: cpu_id=0, flag=4
     smp_test.sh-734   [000] d..2  1049.779093: cpu_frequency_irq_work: cpu_id=0, cpu=-1
     smp_test.sh-734   [000] d..2  1049.779103: cpu_frequency_irq_claim: cpu_id=0, flag=3
          <idle>-0     [000] dNh2  1049.779162: cpu_frequency_irq_run_list: cpu_id=0, flag=0
          <idle>-0     [000] d.s2  1049.792305: cpu_frequency_irq_work: cpu_id=0, cpu=-1
          <idle>-0     [000] d.s2  1049.792315: cpu_frequency_irq_claim: cpu_id=0, flag=3
          <idle>-0     [000] dNH2  1049.792329: cpu_frequency_irq_run_list: cpu_id=0, flag=0
         cpuhp/2-18    [002] ....  1049.808315: cpu_frequency_irq_work_sync: cpu_id=2, cpu=-1, flag=0
         cpuhp/2-18    [002] d..4  1049.808642: cpu_frequency_irq_work: cpu_id=2, cpu=-1
         cpuhp/2-18    [002] d..4  1049.808645: cpu_frequency_irq_claim: cpu_id=2, flag=3
         cpuhp/2-18    [002] dNh1  1049.808658: cpu_frequency_irq_run_list: cpu_id=2, flag=0
     smp_test.sh-734   [000] ...1  1049.811848: cpu_frequency_irq_claim: cpu_id=0, flag=7
     smp_test.sh-734   [000] dNh1  1049.814949: cpu_frequency_irq_run_list: cpu_id=0, flag=4
     smp_test.sh-734   [000] ...1  1049.815988: cpu_frequency_irq_claim: cpu_id=0, flag=7
     smp_test.sh-734   [000] dNh1  1049.816321: cpu_frequency_irq_run_list: cpu_id=0, flag=4
     smp_test.sh-734   [000] d..3  1049.818936: cpu_frequency_irq_work: cpu_id=0, cpu=-1
     smp_test.sh-734   [000] d..3  1049.818946: cpu_frequency_irq_claim: cpu_id=0, flag=3
     smp_test.sh-734   [000] dNh2  1049.818973: cpu_frequency_irq_run_list: cpu_id=0, flag=0
          <idle>-0     [000] d.s4  1049.832308: cpu_frequency_irq_work: cpu_id=0, cpu=-1
          <idle>-0     [000] d.s4  1049.832317: cpu_frequency_irq_claim: cpu_id=0, flag=3
          <idle>-0     [000] dNH3  1049.832332: cpu_frequency_irq_run_list: cpu_id=0, flag=0
         cpuhp/3-23    [003] ....  1049.852314: cpu_frequency_irq_work_sync: cpu_id=3, cpu=-1, flag=0

[Anson] when CPU3 offline, the irq work sync is successfully, no irq work pending any more;

     smp_test.sh-734   [000] ...1  1049.855859: cpu_frequency_irq_claim: cpu_id=0, flag=7
     smp_test.sh-734   [000] dNh1  1049.858958: cpu_frequency_irq_run_list: cpu_id=0, flag=4
     smp_test.sh-734   [000] ...1  1049.859990: cpu_frequency_irq_claim: cpu_id=0, flag=7
     smp_test.sh-734   [000] dNh.  1049.860346: cpu_frequency_irq_run_list: cpu_id=0, flag=4
          <idle>-0     [001] d.h1  1050.896329: cpu_frequency_irq_run_list: cpu_id=1, flag=4
         cpuhp/1-13    [001] ....  1050.916319: cpu_frequency_irq_work_sync: cpu_id=1, cpu=3, flag=3

[Anson] we can see when CPU1 start online and tried to sync irq work, found it is pending on CPU3 which is offline, and in this period, no irq work claimed by cpufreq_governor, 

root@imx8qxpmek:~#




> -----Original Message-----
> From: Anson Huang
> Sent: Friday, November 29, 2019 9:45 PM
> To: 'Rafael J. Wysocki' <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>; Jacky Bai <ping.bai@nxp.com>;
> linux-pm@vger.kernel.org
> Subject: RE: About CPU hot-plug stress test failed in cpufreq driver
> 
> 
> 
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> >
> > On Monday, November 25, 2019 1:44:20 PM CET Rafael J. Wysocki wrote:
> > > On Mon, Nov 25, 2019 at 7:05 AM Anson Huang <anson.huang@nxp.com>
> > wrote:
> > > >
> > > > Hi, Rafael
> > > >         Looks like adding pr_info() in irq_work_sync() makes issue
> > > > can NOT
> > be reproduced, any possibility of race happen there and the pr_info
> > eliminate the race condition? I will continue run the test with the
> > pr_info to see if any luck to reproduce it.
> > >
> > > Yes, it looks like there is a race condition in there.
> > >
> > > I need to analyze the code a bit to confirm it which may take a bit of time.
> >
> > I'm not seeing any races in there expect for the possible
> > over-optimization of
> > irq_work_sync() that I was talking about before.
> >
> > Cheers!
> 
> Thanks,
> Then I have to debug it myself, many i.MX platforms CPU hot-plug are broken
> suddenly, while old kernel versions (4.19) are running just fine.
> 
> Anson


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-05  8:53                     ` Anson Huang
@ 2019-12-05 10:48                       ` Rafael J. Wysocki
  2019-12-05 13:18                         ` Anson Huang
  2019-12-05 11:00                       ` Viresh Kumar
  1 sibling, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-05 10:48 UTC (permalink / raw)
  To: Anson Huang; +Cc: Viresh Kumar, Jacky Bai, linux-pm

On Thursday, December 5, 2019 9:53:20 AM CET Anson Huang wrote:
> Hi, Rafael
> 	This issue is very weird, the irq_work used in cpufreq_governor.c is very
> 	simple, ONLY one entry to claim the irq_work, and cpufreq_governor's irq_work
> 	is a private irq_work structure, no other drivers use it. I added some trace
> 	event in cpufreq_governor.c and irq_work.c, every time, the issue happened at
> 	the point of CPU1/2/3 all off, and CPU1 start ON line, but when CPU1 tried to
> 	sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work shows that
> 	previous work is pending on CPU3 which is offline, I also had the trace event
> 	in irq_work_claim(),  but no any log shows the cpufreq_governor irq_work is
> 	claimed on CPU3 after CPU3 offline, below is the debug patch I added and the
> 	log on 2 consoles:
> 	If I understand it correctly, the irq work used in cpufreq_governor ONLY has
> 	one entry of calling irq_work_queue() which will be ONLY claimed on the CPU
> 	calling the irq_work_queue(), but from trace result, I have NOT see where
> 	CPU3 could call irq_work_queue() after it finishes the irq work sync before
> 	offline.

Right.

Which means that this particular irq_work only runs on the CPU that has
run irq_work_queue() for it.

> 	Could it something wrong related to cache maintain during CPU hotplug?

I'm not sure what is going on, but I do agree that it is weird enough. :-)

[cut]

> LOG on console 1 which does CPU1/2/3 offline and online stress test:
> CPUHotplug: 4575 times remaining
> [ 1047.401185] CPU1: shutdown
> [ 1047.403917] psci: CPU1 killed.
> [ 1047.449153] CPU2: shutdown
> [ 1047.451880] psci: CPU2 killed.
> [ 1047.501131] CPU3: shutdown
> [ 1047.503857] psci: CPU3 killed.
> [ 1048.541939] Detected VIPT I-cache on CPU1
> [ 1048.541983] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
> [ 1048.542050] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
> [ 1048.585024] Detected VIPT I-cache on CPU2
> [ 1048.585061] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
> [ 1048.585121] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
> [ 1048.645070] Detected VIPT I-cache on CPU3
> [ 1048.645112] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
> [ 1048.645181] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
> CPUHotplug: 4574 times remaining
> [ 1049.769187] CPU1: shutdown
> [ 1049.771913] psci: CPU1 killed.
> [ 1049.809126] CPU2: shutdown
> [ 1049.811856] psci: CPU2 killed.
> [ 1049.853135] CPU3: shutdown
> [ 1049.855868] psci: CPU3 killed.
> 
> Waiting here forever.....
> 
> LOG on console 2 which enables the trace events I added upper:
>              sed-4591  [003] d..4  1049.705561: cpu_frequency_irq_claim: cpu_id=3, flag=3
>              sed-4591  [003] dNh1  1049.705604: cpu_frequency_irq_run_list: cpu_id=3, flag=0

So here CPU3 runs an IRQ work, presumably the cpufreq governor's one.

After that its raised_list should be empty and it doesn't claim any IRQ works going
forward.

>           <idle>-0     [001] d.s2  1049.716308: cpu_frequency_irq_work: cpu_id=1, cpu=-1
>           <idle>-0     [001] d.s2  1049.716319: cpu_frequency_irq_claim: cpu_id=1, flag=3
>           <idle>-0     [001] dNH2  1049.716338: cpu_frequency_irq_run_list: cpu_id=1, flag=0

And now CPU1 runs the cpufreq governor IRQ work, so it sets work->cpu to 1 and
then to -1 (when flushing raised_list). 

>           <idle>-0     [002] d.s2  1049.728303: cpu_frequency_irq_work: cpu_id=2, cpu=-1
>           <idle>-0     [002] d.s2  1049.728307: cpu_frequency_irq_claim: cpu_id=2, flag=3
>           <idle>-0     [002] dNH2  1049.728320: cpu_frequency_irq_run_list: cpu_id=2, flag=0
>           <idle>-0     [001] d.s2  1049.740305: cpu_frequency_irq_work: cpu_id=1, cpu=-1
>           <idle>-0     [001] d.s2  1049.740307: cpu_frequency_irq_claim: cpu_id=1, flag=3
>           <idle>-0     [001] dNH2  1049.740319: cpu_frequency_irq_run_list: cpu_id=1, flag=0
>           <idle>-0     [001] d.s2  1049.752305: cpu_frequency_irq_work: cpu_id=1, cpu=-1
>           <idle>-0     [001] d.s2  1049.752307: cpu_frequency_irq_claim: cpu_id=1, flag=3
>           <idle>-0     [001] dNH2  1049.752316: cpu_frequency_irq_run_list: cpu_id=1, flag=0
>          cpuhp/1-13    [001] ....  1049.768340: cpu_frequency_irq_work_sync: cpu_id=1, cpu=-1, flag=0
>          cpuhp/1-13    [001] d..4  1049.768681: cpu_frequency_irq_work: cpu_id=1, cpu=-1
>          cpuhp/1-13    [001] d..4  1049.768683: cpu_frequency_irq_claim: cpu_id=1, flag=3
>          cpuhp/1-13    [001] dNh1  1049.768698: cpu_frequency_irq_run_list: cpu_id=1, flag=0
>      smp_test.sh-734   [000] ...1  1049.771903: cpu_frequency_irq_claim: cpu_id=0, flag=7
>      smp_test.sh-734   [000] dNh1  1049.775009: cpu_frequency_irq_run_list: cpu_id=0, flag=4
>      smp_test.sh-734   [000] ...1  1049.776084: cpu_frequency_irq_claim: cpu_id=0, flag=7
>      smp_test.sh-734   [000] dNh.  1049.776392: cpu_frequency_irq_run_list: cpu_id=0, flag=4
>      smp_test.sh-734   [000] d..2  1049.779093: cpu_frequency_irq_work: cpu_id=0, cpu=-1
>      smp_test.sh-734   [000] d..2  1049.779103: cpu_frequency_irq_claim: cpu_id=0, flag=3
>           <idle>-0     [000] dNh2  1049.779162: cpu_frequency_irq_run_list: cpu_id=0, flag=0
>           <idle>-0     [000] d.s2  1049.792305: cpu_frequency_irq_work: cpu_id=0, cpu=-1
>           <idle>-0     [000] d.s2  1049.792315: cpu_frequency_irq_claim: cpu_id=0, flag=3
>           <idle>-0     [000] dNH2  1049.792329: cpu_frequency_irq_run_list: cpu_id=0, flag=0
>          cpuhp/2-18    [002] ....  1049.808315: cpu_frequency_irq_work_sync: cpu_id=2, cpu=-1, flag=0
>          cpuhp/2-18    [002] d..4  1049.808642: cpu_frequency_irq_work: cpu_id=2, cpu=-1
>          cpuhp/2-18    [002] d..4  1049.808645: cpu_frequency_irq_claim: cpu_id=2, flag=3
>          cpuhp/2-18    [002] dNh1  1049.808658: cpu_frequency_irq_run_list: cpu_id=2, flag=0
>      smp_test.sh-734   [000] ...1  1049.811848: cpu_frequency_irq_claim: cpu_id=0, flag=7
>      smp_test.sh-734   [000] dNh1  1049.814949: cpu_frequency_irq_run_list: cpu_id=0, flag=4
>      smp_test.sh-734   [000] ...1  1049.815988: cpu_frequency_irq_claim: cpu_id=0, flag=7
>      smp_test.sh-734   [000] dNh1  1049.816321: cpu_frequency_irq_run_list: cpu_id=0, flag=4
>      smp_test.sh-734   [000] d..3  1049.818936: cpu_frequency_irq_work: cpu_id=0, cpu=-1
>      smp_test.sh-734   [000] d..3  1049.818946: cpu_frequency_irq_claim: cpu_id=0, flag=3
>      smp_test.sh-734   [000] dNh2  1049.818973: cpu_frequency_irq_run_list: cpu_id=0, flag=0
>           <idle>-0     [000] d.s4  1049.832308: cpu_frequency_irq_work: cpu_id=0, cpu=-1
>           <idle>-0     [000] d.s4  1049.832317: cpu_frequency_irq_claim: cpu_id=0, flag=3
>           <idle>-0     [000] dNH3  1049.832332: cpu_frequency_irq_run_list: cpu_id=0, flag=0
>          cpuhp/3-23    [003] ....  1049.852314: cpu_frequency_irq_work_sync: cpu_id=3, cpu=-1, flag=0
> 
> [Anson] when CPU3 offline, the irq work sync is successfully, no irq work pending any more;
> 
>      smp_test.sh-734   [000] ...1  1049.855859: cpu_frequency_irq_claim: cpu_id=0, flag=7
>      smp_test.sh-734   [000] dNh1  1049.858958: cpu_frequency_irq_run_list: cpu_id=0, flag=4
>      smp_test.sh-734   [000] ...1  1049.859990: cpu_frequency_irq_claim: cpu_id=0, flag=7
>      smp_test.sh-734   [000] dNh.  1049.860346: cpu_frequency_irq_run_list: cpu_id=0, flag=4
>           <idle>-0     [001] d.h1  1050.896329: cpu_frequency_irq_run_list: cpu_id=1, flag=4
>          cpuhp/1-13    [001] ....  1050.916319: cpu_frequency_irq_work_sync: cpu_id=1, cpu=3, flag=3
> 
> [Anson] we can see when CPU1 start online and tried to sync irq work, found
> it is pending on CPU3 which is offline, and in this period, no irq work
> claimed by cpufreq_governor, 

So I'm wondering how it is possible at all that work->cpu value is 3 at this
point.

The last CPU that wrote to work->cpu was CPU0 and the written value was -1, and
CPU3 saw that value when it was running irq_work_sync().

There is no sane way by which work->cpu can be equal to 3 from CPU1's perspective,
because the last value written to it by CPU1 itself was -1 and the last value
written to it by any other CPU also was -1.

Moreover, after CPU3 had updated it last time (and the last value written to it
by CPU3 had been -1), other CPUs, *including* CPU1, updated it too (and that
for multiple times).

So the only theory that can explain why CPU1 sees 3 in there when it is going
online appears to be some silent memory corruption.

That said, have you tried to make the READ_ONCE() change suggested a while ago?




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-05  8:53                     ` Anson Huang
  2019-12-05 10:48                       ` Rafael J. Wysocki
@ 2019-12-05 11:00                       ` Viresh Kumar
  2019-12-05 11:10                         ` Rafael J. Wysocki
  1 sibling, 1 reply; 57+ messages in thread
From: Viresh Kumar @ 2019-12-05 11:00 UTC (permalink / raw)
  To: Anson Huang; +Cc: Rafael J. Wysocki, Jacky Bai, linux-pm

On 05-12-19, 08:53, Anson Huang wrote:
> Hi, Rafael
> 	This issue is very weird, the irq_work used in cpufreq_governor.c is very simple, ONLY one entry to claim the irq_work, and cpufreq_governor's irq_work is a private irq_work structure, no other drivers use it. I added some trace event in cpufreq_governor.c and irq_work.c, every time, the issue happened at the point of CPU1/2/3 all off, and CPU1 start ON line, but when CPU1 tried to sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work shows that previous work is pending on CPU3 which is offline, I also had the trace event in irq_work_claim(),  but no any log shows the cpufreq_governor irq_work is claimed on CPU3 after CPU3 offline, below is the debug patch I added and the log on 2 consoles:
> 	If I understand it correctly, the irq work used in cpufreq_governor ONLY has one entry of calling irq_work_queue() which will be ONLY claimed on the CPU calling the irq_work_queue(), but from trace result, I have NOT see where CPU3 could call irq_work_queue() after it finishes the irq work sync before offline. Could it something wrong related to cache maintain during CPU hotplug?

I think you earlier said that the issue wasn't there in 4.19 kernel,
right ? What about doing git bisect to see if we can find the
offending commit ?

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-05 11:00                       ` Viresh Kumar
@ 2019-12-05 11:10                         ` Rafael J. Wysocki
  2019-12-05 11:17                           ` Viresh Kumar
  0 siblings, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-05 11:10 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: Anson Huang, Jacky Bai, linux-pm

On Thursday, December 5, 2019 12:00:34 PM CET Viresh Kumar wrote:
> On 05-12-19, 08:53, Anson Huang wrote:
> > Hi, Rafael
> > 	This issue is very weird, the irq_work used in cpufreq_governor.c is very simple, ONLY one entry to claim the irq_work, and cpufreq_governor's irq_work is a private irq_work structure, no other drivers use it. I added some trace event in cpufreq_governor.c and irq_work.c, every time, the issue happened at the point of CPU1/2/3 all off, and CPU1 start ON line, but when CPU1 tried to sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work shows that previous work is pending on CPU3 which is offline, I also had the trace event in irq_work_claim(),  but no any log shows the cpufreq_governor irq_work is claimed on CPU3 after CPU3 offline, below is the debug patch I added and the log on 2 consoles:
> > 	If I understand it correctly, the irq work used in cpufreq_governor ONLY has one entry of calling irq_work_queue() which will be ONLY claimed on the CPU calling the irq_work_queue(), but from trace result, I have NOT see where CPU3 could call irq_work_queue() after it finishes the irq work sync before offline. Could it something wrong related to cache maintain during CPU hotplug?
> 
> I think you earlier said that the issue wasn't there in 4.19 kernel,
> right ? What about doing git bisect to see if we can find the
> offending commit ?

That won't hurt, but I guess that it will be just the one that started to
use irq_work ..

Also note that schedutil has the same issue.




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-05 11:10                         ` Rafael J. Wysocki
@ 2019-12-05 11:17                           ` Viresh Kumar
  0 siblings, 0 replies; 57+ messages in thread
From: Viresh Kumar @ 2019-12-05 11:17 UTC (permalink / raw)
  To: Rafael J. Wysocki, peterz; +Cc: Anson Huang, Jacky Bai, linux-pm

+Peter

On 05-12-19, 12:10, Rafael J. Wysocki wrote:
> On Thursday, December 5, 2019 12:00:34 PM CET Viresh Kumar wrote:
> > On 05-12-19, 08:53, Anson Huang wrote:
> > > Hi, Rafael
> > > 	This issue is very weird, the irq_work used in cpufreq_governor.c is very simple, ONLY one entry to claim the irq_work, and cpufreq_governor's irq_work is a private irq_work structure, no other drivers use it. I added some trace event in cpufreq_governor.c and irq_work.c, every time, the issue happened at the point of CPU1/2/3 all off, and CPU1 start ON line, but when CPU1 tried to sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work shows that previous work is pending on CPU3 which is offline, I also had the trace event in irq_work_claim(),  but no any log shows the cpufreq_governor irq_work is claimed on CPU3 after CPU3 offline, below is the debug patch I added and the log on 2 consoles:
> > > 	If I understand it correctly, the irq work used in cpufreq_governor ONLY has one entry of calling irq_work_queue() which will be ONLY claimed on the CPU calling the irq_work_queue(), but from trace result, I have NOT see where CPU3 could call irq_work_queue() after it finishes the irq work sync before offline. Could it something wrong related to cache maintain during CPU hotplug?
> > 
> > I think you earlier said that the issue wasn't there in 4.19 kernel,
> > right ? What about doing git bisect to see if we can find the
> > offending commit ?
> 
> That won't hurt, but I guess that it will be just the one that started to
> use irq_work ..

cpufreq_governor.c has hardly seen any patches since 4.19 and irq_work
is being used since a long time before 4.19 I think.

> Also note that schedutil has the same issue.

I agree, but at the same time both cpufreq_governor.c and schedutil
have almost exactly the same code around irq_work. It may be an update
to irq_work implementation, or something else that is causing the
corruption here.

And git blame may just point us to the offending patch if we are lucky
enough :)

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-05 10:48                       ` Rafael J. Wysocki
@ 2019-12-05 13:18                         ` Anson Huang
  2019-12-05 15:52                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-12-05 13:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, Peng Fan; +Cc: Viresh Kumar, Jacky Bai, linux-pm



> -----Original Message-----
> From: Rafael J. Wysocki <rjw@rjwysocki.net>
> Sent: Thursday, December 5, 2019 6:48 PM
> To: Anson Huang <anson.huang@nxp.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>; Jacky Bai <ping.bai@nxp.com>;
> linux-pm@vger.kernel.org
> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Thursday, December 5, 2019 9:53:20 AM CET Anson Huang wrote:
> > Hi, Rafael
> > 	This issue is very weird, the irq_work used in cpufreq_governor.c is
> very
> > 	simple, ONLY one entry to claim the irq_work, and
> cpufreq_governor's irq_work
> > 	is a private irq_work structure, no other drivers use it. I added some
> trace
> > 	event in cpufreq_governor.c and irq_work.c, every time, the issue
> happened at
> > 	the point of CPU1/2/3 all off, and CPU1 start ON line, but when CPU1
> tried to
> > 	sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work
> shows that
> > 	previous work is pending on CPU3 which is offline, I also had the
> trace event
> > 	in irq_work_claim(),  but no any log shows the cpufreq_governor
> irq_work is
> > 	claimed on CPU3 after CPU3 offline, below is the debug patch I added
> and the
> > 	log on 2 consoles:
> > 	If I understand it correctly, the irq work used in cpufreq_governor
> ONLY has
> > 	one entry of calling irq_work_queue() which will be ONLY claimed on
> the CPU
> > 	calling the irq_work_queue(), but from trace result, I have NOT see
> where
> > 	CPU3 could call irq_work_queue() after it finishes the irq work sync
> before
> > 	offline.
> 
> Right.
> 
> Which means that this particular irq_work only runs on the CPU that has run
> irq_work_queue() for it.
> 
> > 	Could it something wrong related to cache maintain during CPU
> hotplug?
> 
> I'm not sure what is going on, but I do agree that it is weird enough. :-)
> 
> [cut]
> 
> > LOG on console 1 which does CPU1/2/3 offline and online stress test:
> > CPUHotplug: 4575 times remaining
> > [ 1047.401185] CPU1: shutdown
> > [ 1047.403917] psci: CPU1 killed.
> > [ 1047.449153] CPU2: shutdown
> > [ 1047.451880] psci: CPU2 killed.
> > [ 1047.501131] CPU3: shutdown
> > [ 1047.503857] psci: CPU3 killed.
> > [ 1048.541939] Detected VIPT I-cache on CPU1 [ 1048.541983] GICv3:
> > CPU1: found redistributor 1 region 0:0x0000000051b20000 [ 1048.542050]
> > CPU1: Booted secondary processor 0x0000000001 [0x410fd042] [
> > 1048.585024] Detected VIPT I-cache on CPU2 [ 1048.585061] GICv3: CPU2:
> > found redistributor 2 region 0:0x0000000051b40000 [ 1048.585121] CPU2:
> > Booted secondary processor 0x0000000002 [0x410fd042] [ 1048.645070]
> > Detected VIPT I-cache on CPU3 [ 1048.645112] GICv3: CPU3: found
> > redistributor 3 region 0:0x0000000051b60000 [ 1048.645181] CPU3:
> > Booted secondary processor 0x0000000003 [0x410fd042]
> > CPUHotplug: 4574 times remaining
> > [ 1049.769187] CPU1: shutdown
> > [ 1049.771913] psci: CPU1 killed.
> > [ 1049.809126] CPU2: shutdown
> > [ 1049.811856] psci: CPU2 killed.
> > [ 1049.853135] CPU3: shutdown
> > [ 1049.855868] psci: CPU3 killed.
> >
> > Waiting here forever.....
> >
> > LOG on console 2 which enables the trace events I added upper:
> >              sed-4591  [003] d..4  1049.705561: cpu_frequency_irq_claim:
> cpu_id=3, flag=3
> >              sed-4591  [003] dNh1  1049.705604:
> > cpu_frequency_irq_run_list: cpu_id=3, flag=0
> 
> So here CPU3 runs an IRQ work, presumably the cpufreq governor's one.
> 
> After that its raised_list should be empty and it doesn't claim any IRQ works
> going forward.
> 
> >           <idle>-0     [001] d.s2  1049.716308: cpu_frequency_irq_work:
> cpu_id=1, cpu=-1
> >           <idle>-0     [001] d.s2  1049.716319: cpu_frequency_irq_claim:
> cpu_id=1, flag=3
> >           <idle>-0     [001] dNH2  1049.716338: cpu_frequency_irq_run_list:
> cpu_id=1, flag=0
> 
> And now CPU1 runs the cpufreq governor IRQ work, so it sets work->cpu to 1
> and then to -1 (when flushing raised_list).
> 
> >           <idle>-0     [002] d.s2  1049.728303: cpu_frequency_irq_work:
> cpu_id=2, cpu=-1
> >           <idle>-0     [002] d.s2  1049.728307: cpu_frequency_irq_claim:
> cpu_id=2, flag=3
> >           <idle>-0     [002] dNH2  1049.728320: cpu_frequency_irq_run_list:
> cpu_id=2, flag=0
> >           <idle>-0     [001] d.s2  1049.740305: cpu_frequency_irq_work:
> cpu_id=1, cpu=-1
> >           <idle>-0     [001] d.s2  1049.740307: cpu_frequency_irq_claim:
> cpu_id=1, flag=3
> >           <idle>-0     [001] dNH2  1049.740319: cpu_frequency_irq_run_list:
> cpu_id=1, flag=0
> >           <idle>-0     [001] d.s2  1049.752305: cpu_frequency_irq_work:
> cpu_id=1, cpu=-1
> >           <idle>-0     [001] d.s2  1049.752307: cpu_frequency_irq_claim:
> cpu_id=1, flag=3
> >           <idle>-0     [001] dNH2  1049.752316: cpu_frequency_irq_run_list:
> cpu_id=1, flag=0
> >          cpuhp/1-13    [001] ....  1049.768340: cpu_frequency_irq_work_sync:
> cpu_id=1, cpu=-1, flag=0
> >          cpuhp/1-13    [001] d..4  1049.768681: cpu_frequency_irq_work:
> cpu_id=1, cpu=-1
> >          cpuhp/1-13    [001] d..4  1049.768683: cpu_frequency_irq_claim:
> cpu_id=1, flag=3
> >          cpuhp/1-13    [001] dNh1  1049.768698: cpu_frequency_irq_run_list:
> cpu_id=1, flag=0
> >      smp_test.sh-734   [000] ...1  1049.771903: cpu_frequency_irq_claim:
> cpu_id=0, flag=7
> >      smp_test.sh-734   [000] dNh1  1049.775009: cpu_frequency_irq_run_list:
> cpu_id=0, flag=4
> >      smp_test.sh-734   [000] ...1  1049.776084: cpu_frequency_irq_claim:
> cpu_id=0, flag=7
> >      smp_test.sh-734   [000] dNh.  1049.776392: cpu_frequency_irq_run_list:
> cpu_id=0, flag=4
> >      smp_test.sh-734   [000] d..2  1049.779093: cpu_frequency_irq_work:
> cpu_id=0, cpu=-1
> >      smp_test.sh-734   [000] d..2  1049.779103: cpu_frequency_irq_claim:
> cpu_id=0, flag=3
> >           <idle>-0     [000] dNh2  1049.779162: cpu_frequency_irq_run_list:
> cpu_id=0, flag=0
> >           <idle>-0     [000] d.s2  1049.792305: cpu_frequency_irq_work:
> cpu_id=0, cpu=-1
> >           <idle>-0     [000] d.s2  1049.792315: cpu_frequency_irq_claim:
> cpu_id=0, flag=3
> >           <idle>-0     [000] dNH2  1049.792329: cpu_frequency_irq_run_list:
> cpu_id=0, flag=0
> >          cpuhp/2-18    [002] ....  1049.808315: cpu_frequency_irq_work_sync:
> cpu_id=2, cpu=-1, flag=0
> >          cpuhp/2-18    [002] d..4  1049.808642: cpu_frequency_irq_work:
> cpu_id=2, cpu=-1
> >          cpuhp/2-18    [002] d..4  1049.808645: cpu_frequency_irq_claim:
> cpu_id=2, flag=3
> >          cpuhp/2-18    [002] dNh1  1049.808658: cpu_frequency_irq_run_list:
> cpu_id=2, flag=0
> >      smp_test.sh-734   [000] ...1  1049.811848: cpu_frequency_irq_claim:
> cpu_id=0, flag=7
> >      smp_test.sh-734   [000] dNh1  1049.814949: cpu_frequency_irq_run_list:
> cpu_id=0, flag=4
> >      smp_test.sh-734   [000] ...1  1049.815988: cpu_frequency_irq_claim:
> cpu_id=0, flag=7
> >      smp_test.sh-734   [000] dNh1  1049.816321: cpu_frequency_irq_run_list:
> cpu_id=0, flag=4
> >      smp_test.sh-734   [000] d..3  1049.818936: cpu_frequency_irq_work:
> cpu_id=0, cpu=-1
> >      smp_test.sh-734   [000] d..3  1049.818946: cpu_frequency_irq_claim:
> cpu_id=0, flag=3
> >      smp_test.sh-734   [000] dNh2  1049.818973: cpu_frequency_irq_run_list:
> cpu_id=0, flag=0
> >           <idle>-0     [000] d.s4  1049.832308: cpu_frequency_irq_work:
> cpu_id=0, cpu=-1
> >           <idle>-0     [000] d.s4  1049.832317: cpu_frequency_irq_claim:
> cpu_id=0, flag=3
> >           <idle>-0     [000] dNH3  1049.832332: cpu_frequency_irq_run_list:
> cpu_id=0, flag=0
> >          cpuhp/3-23    [003] ....  1049.852314: cpu_frequency_irq_work_sync:
> cpu_id=3, cpu=-1, flag=0
> >
> > [Anson] when CPU3 offline, the irq work sync is successfully, no irq
> > work pending any more;
> >
> >      smp_test.sh-734   [000] ...1  1049.855859: cpu_frequency_irq_claim:
> cpu_id=0, flag=7
> >      smp_test.sh-734   [000] dNh1  1049.858958: cpu_frequency_irq_run_list:
> cpu_id=0, flag=4
> >      smp_test.sh-734   [000] ...1  1049.859990: cpu_frequency_irq_claim:
> cpu_id=0, flag=7
> >      smp_test.sh-734   [000] dNh.  1049.860346: cpu_frequency_irq_run_list:
> cpu_id=0, flag=4
> >           <idle>-0     [001] d.h1  1050.896329: cpu_frequency_irq_run_list:
> cpu_id=1, flag=4
> >          cpuhp/1-13    [001] ....  1050.916319: cpu_frequency_irq_work_sync:
> cpu_id=1, cpu=3, flag=3
> >
> > [Anson] we can see when CPU1 start online and tried to sync irq work,
> > found it is pending on CPU3 which is offline, and in this period, no
> > irq work claimed by cpufreq_governor,
> 
> So I'm wondering how it is possible at all that work->cpu value is 3 at this
> point.
> 
> The last CPU that wrote to work->cpu was CPU0 and the written value was -1,
> and
> CPU3 saw that value when it was running irq_work_sync().
> 
> There is no sane way by which work->cpu can be equal to 3 from CPU1's
> perspective, because the last value written to it by CPU1 itself was -1 and the
> last value written to it by any other CPU also was -1.
> 
> Moreover, after CPU3 had updated it last time (and the last value written to
> it by CPU3 had been -1), other CPUs, *including* CPU1, updated it too (and
> that for multiple times).
> 
> So the only theory that can explain why CPU1 sees 3 in there when it is going
> online appears to be some silent memory corruption.
> 
> That said, have you tried to make the READ_ONCE() change suggested a
> while ago?

Below patch does NOT work using READ_ONCE() if I did the change correctly:

@@ -212,7 +208,7 @@ void irq_work_sync(struct irq_work *work)
 {
        lockdep_assert_irqs_enabled();

-       while (work->flags & IRQ_WORK_BUSY)
+       while (READ_ONCE(work->flags) & IRQ_WORK_BUSY)
                cpu_relax();
 }

LOG:
CPUHotplug: 4937 times remaining
[  214.837047] CPU1: shutdown
[  214.839781] psci: CPU1 killed.
[  214.877041] CPU2: shutdown
[  214.879767] psci: CPU2 killed.
[  214.917026] CPU3: shutdown
[  214.919758] psci: CPU3 killed.
[  215.957816] Detected VIPT I-cache on CPU1
[  215.957860] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
[  215.957930] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
[  216.001025] Detected VIPT I-cache on CPU2
[  216.001064] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
[  216.001126] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
[  216.068960] Detected VIPT I-cache on CPU3
[  216.069004] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
[  216.069076] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
CPUHotplug: 4936 times remaining
[  217.201055] CPU1: shutdown
[  217.203779] psci: CPU1 killed.
[  400.506869] audit: type=1006 audit(1573738201.312:3): pid=1332 uid=0 old-aui1
[ 4000.600430] audit: type=1006 audit(1573741801.408:4): pid=1352 uid=0 old-aui1
[ 7600.687496] audit: type=1006 audit(1573745401.492:5): pid=1371 uid=0 old-aui1


         cpuhp/1-13    [001] ....   217.200231: cpu_frequency_irq_work_sync: cpu_id=1, cpu=-1, flag=0
     smp_test.sh-741   [002] ...1   217.203770: cpu_frequency_irq_claim: cpu_id=2, flag=7
     smp_test.sh-741   [002] d.h1   217.206873: cpu_frequency_irq_run_list: cpu_id=2, flag=4
     smp_test.sh-741   [002] ...1   217.206893: cpu_frequency_irq_claim: cpu_id=2, flag=7
     smp_test.sh-741   [002] dNh.   217.208222: cpu_frequency_irq_run_list: cpu_id=2, flag=4
         cpuhp/2-18    [002] ....   217.248206: cpu_frequency_irq_work_sync: cpu_id=2, cpu=1, flag=3

[Anson] this time, the irq work is pending on CPU1 which is offline.

         kauditd-31    [000] ...1   400.519304: cpu_frequency_irq_claim: cpu_id=0, flag=7
          <idle>-0     [000] dNh1   400.520231: cpu_frequency_irq_run_list: cpu_id=0, flag=4
         kauditd-31    [003] ...1  4000.612845: cpu_frequency_irq_claim: cpu_id=3, flag=7
           crond-1352  [003] d.h.  4000.616221: cpu_frequency_irq_run_list: cpu_id=3, flag=4
         kauditd-31    [000] ...1  7600.699988: cpu_frequency_irq_claim: cpu_id=0, flag=7
          <idle>-0     [000] dNh1  7600.700205: cpu_frequency_irq_run_list: cpu_id=0, flag=4
root@imx8qxpmek:~#



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-05 13:18                         ` Anson Huang
@ 2019-12-05 15:52                           ` Rafael J. Wysocki
  2019-12-09 10:31                             ` Peng Fan
  2019-12-09 10:37                             ` Anson Huang
  0 siblings, 2 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-05 15:52 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Peng Fan, Viresh Kumar, Jacky Bai, linux-pm

On Thu, Dec 5, 2019 at 2:18 PM Anson Huang <anson.huang@nxp.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Sent: Thursday, December 5, 2019 6:48 PM
> > To: Anson Huang <anson.huang@nxp.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>; Jacky Bai <ping.bai@nxp.com>;
> > linux-pm@vger.kernel.org
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> >
> > On Thursday, December 5, 2019 9:53:20 AM CET Anson Huang wrote:
> > > Hi, Rafael
> > >     This issue is very weird, the irq_work used in cpufreq_governor.c is
> > very
> > >     simple, ONLY one entry to claim the irq_work, and
> > cpufreq_governor's irq_work
> > >     is a private irq_work structure, no other drivers use it. I added some
> > trace
> > >     event in cpufreq_governor.c and irq_work.c, every time, the issue
> > happened at
> > >     the point of CPU1/2/3 all off, and CPU1 start ON line, but when CPU1
> > tried to
> > >     sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work
> > shows that
> > >     previous work is pending on CPU3 which is offline, I also had the
> > trace event
> > >     in irq_work_claim(),  but no any log shows the cpufreq_governor
> > irq_work is
> > >     claimed on CPU3 after CPU3 offline, below is the debug patch I added
> > and the
> > >     log on 2 consoles:
> > >     If I understand it correctly, the irq work used in cpufreq_governor
> > ONLY has
> > >     one entry of calling irq_work_queue() which will be ONLY claimed on
> > the CPU
> > >     calling the irq_work_queue(), but from trace result, I have NOT see
> > where
> > >     CPU3 could call irq_work_queue() after it finishes the irq work sync
> > before
> > >     offline.
> >
> > Right.
> >
> > Which means that this particular irq_work only runs on the CPU that has run
> > irq_work_queue() for it.
> >
> > >     Could it something wrong related to cache maintain during CPU
> > hotplug?
> >
> > I'm not sure what is going on, but I do agree that it is weird enough. :-)
> >
> > [cut]
> >
> > > LOG on console 1 which does CPU1/2/3 offline and online stress test:
> > > CPUHotplug: 4575 times remaining
> > > [ 1047.401185] CPU1: shutdown
> > > [ 1047.403917] psci: CPU1 killed.
> > > [ 1047.449153] CPU2: shutdown
> > > [ 1047.451880] psci: CPU2 killed.
> > > [ 1047.501131] CPU3: shutdown
> > > [ 1047.503857] psci: CPU3 killed.
> > > [ 1048.541939] Detected VIPT I-cache on CPU1 [ 1048.541983] GICv3:
> > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [ 1048.542050]
> > > CPU1: Booted secondary processor 0x0000000001 [0x410fd042] [
> > > 1048.585024] Detected VIPT I-cache on CPU2 [ 1048.585061] GICv3: CPU2:
> > > found redistributor 2 region 0:0x0000000051b40000 [ 1048.585121] CPU2:
> > > Booted secondary processor 0x0000000002 [0x410fd042] [ 1048.645070]
> > > Detected VIPT I-cache on CPU3 [ 1048.645112] GICv3: CPU3: found
> > > redistributor 3 region 0:0x0000000051b60000 [ 1048.645181] CPU3:
> > > Booted secondary processor 0x0000000003 [0x410fd042]
> > > CPUHotplug: 4574 times remaining
> > > [ 1049.769187] CPU1: shutdown
> > > [ 1049.771913] psci: CPU1 killed.
> > > [ 1049.809126] CPU2: shutdown
> > > [ 1049.811856] psci: CPU2 killed.
> > > [ 1049.853135] CPU3: shutdown
> > > [ 1049.855868] psci: CPU3 killed.
> > >
> > > Waiting here forever.....
> > >
> > > LOG on console 2 which enables the trace events I added upper:
> > >              sed-4591  [003] d..4  1049.705561: cpu_frequency_irq_claim:
> > cpu_id=3, flag=3
> > >              sed-4591  [003] dNh1  1049.705604:
> > > cpu_frequency_irq_run_list: cpu_id=3, flag=0
> >
> > So here CPU3 runs an IRQ work, presumably the cpufreq governor's one.
> >
> > After that its raised_list should be empty and it doesn't claim any IRQ works
> > going forward.
> >
> > >           <idle>-0     [001] d.s2  1049.716308: cpu_frequency_irq_work:
> > cpu_id=1, cpu=-1
> > >           <idle>-0     [001] d.s2  1049.716319: cpu_frequency_irq_claim:
> > cpu_id=1, flag=3
> > >           <idle>-0     [001] dNH2  1049.716338: cpu_frequency_irq_run_list:
> > cpu_id=1, flag=0
> >
> > And now CPU1 runs the cpufreq governor IRQ work, so it sets work->cpu to 1
> > and then to -1 (when flushing raised_list).
> >
> > >           <idle>-0     [002] d.s2  1049.728303: cpu_frequency_irq_work:
> > cpu_id=2, cpu=-1
> > >           <idle>-0     [002] d.s2  1049.728307: cpu_frequency_irq_claim:
> > cpu_id=2, flag=3
> > >           <idle>-0     [002] dNH2  1049.728320: cpu_frequency_irq_run_list:
> > cpu_id=2, flag=0
> > >           <idle>-0     [001] d.s2  1049.740305: cpu_frequency_irq_work:
> > cpu_id=1, cpu=-1
> > >           <idle>-0     [001] d.s2  1049.740307: cpu_frequency_irq_claim:
> > cpu_id=1, flag=3
> > >           <idle>-0     [001] dNH2  1049.740319: cpu_frequency_irq_run_list:
> > cpu_id=1, flag=0
> > >           <idle>-0     [001] d.s2  1049.752305: cpu_frequency_irq_work:
> > cpu_id=1, cpu=-1
> > >           <idle>-0     [001] d.s2  1049.752307: cpu_frequency_irq_claim:
> > cpu_id=1, flag=3
> > >           <idle>-0     [001] dNH2  1049.752316: cpu_frequency_irq_run_list:
> > cpu_id=1, flag=0
> > >          cpuhp/1-13    [001] ....  1049.768340: cpu_frequency_irq_work_sync:
> > cpu_id=1, cpu=-1, flag=0
> > >          cpuhp/1-13    [001] d..4  1049.768681: cpu_frequency_irq_work:
> > cpu_id=1, cpu=-1
> > >          cpuhp/1-13    [001] d..4  1049.768683: cpu_frequency_irq_claim:
> > cpu_id=1, flag=3
> > >          cpuhp/1-13    [001] dNh1  1049.768698: cpu_frequency_irq_run_list:
> > cpu_id=1, flag=0
> > >      smp_test.sh-734   [000] ...1  1049.771903: cpu_frequency_irq_claim:
> > cpu_id=0, flag=7
> > >      smp_test.sh-734   [000] dNh1  1049.775009: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=4
> > >      smp_test.sh-734   [000] ...1  1049.776084: cpu_frequency_irq_claim:
> > cpu_id=0, flag=7
> > >      smp_test.sh-734   [000] dNh.  1049.776392: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=4
> > >      smp_test.sh-734   [000] d..2  1049.779093: cpu_frequency_irq_work:
> > cpu_id=0, cpu=-1
> > >      smp_test.sh-734   [000] d..2  1049.779103: cpu_frequency_irq_claim:
> > cpu_id=0, flag=3
> > >           <idle>-0     [000] dNh2  1049.779162: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=0
> > >           <idle>-0     [000] d.s2  1049.792305: cpu_frequency_irq_work:
> > cpu_id=0, cpu=-1
> > >           <idle>-0     [000] d.s2  1049.792315: cpu_frequency_irq_claim:
> > cpu_id=0, flag=3
> > >           <idle>-0     [000] dNH2  1049.792329: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=0
> > >          cpuhp/2-18    [002] ....  1049.808315: cpu_frequency_irq_work_sync:
> > cpu_id=2, cpu=-1, flag=0
> > >          cpuhp/2-18    [002] d..4  1049.808642: cpu_frequency_irq_work:
> > cpu_id=2, cpu=-1
> > >          cpuhp/2-18    [002] d..4  1049.808645: cpu_frequency_irq_claim:
> > cpu_id=2, flag=3
> > >          cpuhp/2-18    [002] dNh1  1049.808658: cpu_frequency_irq_run_list:
> > cpu_id=2, flag=0
> > >      smp_test.sh-734   [000] ...1  1049.811848: cpu_frequency_irq_claim:
> > cpu_id=0, flag=7
> > >      smp_test.sh-734   [000] dNh1  1049.814949: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=4
> > >      smp_test.sh-734   [000] ...1  1049.815988: cpu_frequency_irq_claim:
> > cpu_id=0, flag=7
> > >      smp_test.sh-734   [000] dNh1  1049.816321: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=4
> > >      smp_test.sh-734   [000] d..3  1049.818936: cpu_frequency_irq_work:
> > cpu_id=0, cpu=-1
> > >      smp_test.sh-734   [000] d..3  1049.818946: cpu_frequency_irq_claim:
> > cpu_id=0, flag=3
> > >      smp_test.sh-734   [000] dNh2  1049.818973: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=0
> > >           <idle>-0     [000] d.s4  1049.832308: cpu_frequency_irq_work:
> > cpu_id=0, cpu=-1
> > >           <idle>-0     [000] d.s4  1049.832317: cpu_frequency_irq_claim:
> > cpu_id=0, flag=3
> > >           <idle>-0     [000] dNH3  1049.832332: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=0
> > >          cpuhp/3-23    [003] ....  1049.852314: cpu_frequency_irq_work_sync:
> > cpu_id=3, cpu=-1, flag=0
> > >
> > > [Anson] when CPU3 offline, the irq work sync is successfully, no irq
> > > work pending any more;
> > >
> > >      smp_test.sh-734   [000] ...1  1049.855859: cpu_frequency_irq_claim:
> > cpu_id=0, flag=7
> > >      smp_test.sh-734   [000] dNh1  1049.858958: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=4
> > >      smp_test.sh-734   [000] ...1  1049.859990: cpu_frequency_irq_claim:
> > cpu_id=0, flag=7
> > >      smp_test.sh-734   [000] dNh.  1049.860346: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=4
> > >           <idle>-0     [001] d.h1  1050.896329: cpu_frequency_irq_run_list:
> > cpu_id=1, flag=4
> > >          cpuhp/1-13    [001] ....  1050.916319: cpu_frequency_irq_work_sync:
> > cpu_id=1, cpu=3, flag=3
> > >
> > > [Anson] we can see when CPU1 start online and tried to sync irq work,
> > > found it is pending on CPU3 which is offline, and in this period, no
> > > irq work claimed by cpufreq_governor,
> >
> > So I'm wondering how it is possible at all that work->cpu value is 3 at this
> > point.
> >
> > The last CPU that wrote to work->cpu was CPU0 and the written value was -1,
> > and
> > CPU3 saw that value when it was running irq_work_sync().
> >
> > There is no sane way by which work->cpu can be equal to 3 from CPU1's
> > perspective, because the last value written to it by CPU1 itself was -1 and the
> > last value written to it by any other CPU also was -1.
> >
> > Moreover, after CPU3 had updated it last time (and the last value written to
> > it by CPU3 had been -1), other CPUs, *including* CPU1, updated it too (and
> > that for multiple times).
> >
> > So the only theory that can explain why CPU1 sees 3 in there when it is going
> > online appears to be some silent memory corruption.
> >
> > That said, have you tried to make the READ_ONCE() change suggested a
> > while ago?
>
> Below patch does NOT work using READ_ONCE() if I did the change correctly:

OK

> @@ -212,7 +208,7 @@ void irq_work_sync(struct irq_work *work)
>  {
>         lockdep_assert_irqs_enabled();
>
> -       while (work->flags & IRQ_WORK_BUSY)
> +       while (READ_ONCE(work->flags) & IRQ_WORK_BUSY)

You also may try using test_bit() instead of the raw read, but anyway
at this point I would start talking to the arch/HW people if I were
you.

>                 cpu_relax();
>  }
>
> LOG:
> CPUHotplug: 4937 times remaining
> [  214.837047] CPU1: shutdown
> [  214.839781] psci: CPU1 killed.
> [  214.877041] CPU2: shutdown
> [  214.879767] psci: CPU2 killed.
> [  214.917026] CPU3: shutdown
> [  214.919758] psci: CPU3 killed.
> [  215.957816] Detected VIPT I-cache on CPU1
> [  215.957860] GICv3: CPU1: found redistributor 1 region 0:0x0000000051b20000
> [  215.957930] CPU1: Booted secondary processor 0x0000000001 [0x410fd042]
> [  216.001025] Detected VIPT I-cache on CPU2
> [  216.001064] GICv3: CPU2: found redistributor 2 region 0:0x0000000051b40000
> [  216.001126] CPU2: Booted secondary processor 0x0000000002 [0x410fd042]
> [  216.068960] Detected VIPT I-cache on CPU3
> [  216.069004] GICv3: CPU3: found redistributor 3 region 0:0x0000000051b60000
> [  216.069076] CPU3: Booted secondary processor 0x0000000003 [0x410fd042]
> CPUHotplug: 4936 times remaining
> [  217.201055] CPU1: shutdown
> [  217.203779] psci: CPU1 killed.
> [  400.506869] audit: type=1006 audit(1573738201.312:3): pid=1332 uid=0 old-aui1
> [ 4000.600430] audit: type=1006 audit(1573741801.408:4): pid=1352 uid=0 old-aui1
> [ 7600.687496] audit: type=1006 audit(1573745401.492:5): pid=1371 uid=0 old-aui1
>
>
>          cpuhp/1-13    [001] ....   217.200231: cpu_frequency_irq_work_sync: cpu_id=1, cpu=-1, flag=0
>      smp_test.sh-741   [002] ...1   217.203770: cpu_frequency_irq_claim: cpu_id=2, flag=7
>      smp_test.sh-741   [002] d.h1   217.206873: cpu_frequency_irq_run_list: cpu_id=2, flag=4
>      smp_test.sh-741   [002] ...1   217.206893: cpu_frequency_irq_claim: cpu_id=2, flag=7
>      smp_test.sh-741   [002] dNh.   217.208222: cpu_frequency_irq_run_list: cpu_id=2, flag=4
>          cpuhp/2-18    [002] ....   217.248206: cpu_frequency_irq_work_sync: cpu_id=2, cpu=1, flag=3
>
> [Anson] this time, the irq work is pending on CPU1 which is offline.
>
>          kauditd-31    [000] ...1   400.519304: cpu_frequency_irq_claim: cpu_id=0, flag=7
>           <idle>-0     [000] dNh1   400.520231: cpu_frequency_irq_run_list: cpu_id=0, flag=4
>          kauditd-31    [003] ...1  4000.612845: cpu_frequency_irq_claim: cpu_id=3, flag=7
>            crond-1352  [003] d.h.  4000.616221: cpu_frequency_irq_run_list: cpu_id=3, flag=4
>          kauditd-31    [000] ...1  7600.699988: cpu_frequency_irq_claim: cpu_id=0, flag=7
>           <idle>-0     [000] dNh1  7600.700205: cpu_frequency_irq_run_list: cpu_id=0, flag=4
> root@imx8qxpmek:~#
>
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-05 15:52                           ` Rafael J. Wysocki
@ 2019-12-09 10:31                             ` Peng Fan
  2019-12-09 10:37                             ` Anson Huang
  1 sibling, 0 replies; 57+ messages in thread
From: Peng Fan @ 2019-12-09 10:31 UTC (permalink / raw)
  To: Rafael J. Wysocki, Anson Huang
  Cc: Rafael J. Wysocki, Viresh Kumar, Jacky Bai, linux-pm

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver

Is dbs_update_util_handler expected to run on a different CPU compared with cpu_of(rq) from cpufreq_update_util?

If not, dbs_update_util_handler will inject irq to wrong CPU.

two approaches are did, I am not very sure, but help check 1 or 2 or any advice.
1st, only partial to make it understandable.
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 4bb054d0cb43..9666ee74d7b7 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -267,7 +267,7 @@ static void dbs_irq_work(struct irq_work *irq_work)
 }

 static void dbs_update_util_handler(struct update_util_data *data, u64 time,
-                                   unsigned int flags)
+                                   unsigned int flags, int cpu)
 {
        struct cpu_dbs_info *cdbs = container_of(data, struct cpu_dbs_info, update_util);
        struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
@@ -316,7 +316,8 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,

        policy_dbs->last_sample_time = time;
        policy_dbs->work_in_progress = true;
-       irq_work_queue(&policy_dbs->irq_work);
+
+       irq_work_queue_on(&policy_dbs->irq_work, cpu);
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c8870c5bd7df..27a334208a53 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2302,7 +2302,7 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
        data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
                                                  cpu_of(rq)));
        if (data)
-               data->func(data, rq_clock(rq), flags);
+               data->func(data, rq_clock(rq), flags, cpu_of(rq));
 }
 #else
 static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}

2rd:
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 4bb054d0cb43..69e97cbdf9ba 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -266,6 +266,7 @@ static void dbs_irq_work(struct irq_work *irq_work)
        schedule_work_on(smp_processor_id(), &policy_dbs->work);
}

+extern void *util_data_valid(void);
static void dbs_update_util_handler(struct update_util_data *data, u64 time,
                                    unsigned int flags)
{
@@ -316,6 +317,9 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,

        policy_dbs->last_sample_time = time;
        policy_dbs->work_in_progress = true;
+
+       if (!util_data_valid())
+               return;
        irq_work_queue(&policy_dbs->irq_work);
}

diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index b5dcd1d83c7f..44e7037f8dc9 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2016, Intel Corporation
  * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
  */
+#include <linux/rcupdate.h>
#include "sched.h"

DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);
@@ -57,3 +58,9 @@ void cpufreq_remove_update_util_hook(int cpu)
        rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);
}
EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
+
+void *util_data_valid(void)
+{
+       return rcu_dereference(per_cpu(cpufreq_update_util_data, smp_processor_id()));
+}
+EXPORT_SYMBOL_GPL(util_data_valid);

Thanks,
Peng.

> 
> On Thu, Dec 5, 2019 at 2:18 PM Anson Huang <anson.huang@nxp.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Sent: Thursday, December 5, 2019 6:48 PM
> > > To: Anson Huang <anson.huang@nxp.com>
> > > Cc: Viresh Kumar <viresh.kumar@linaro.org>; Jacky Bai
> > > <ping.bai@nxp.com>; linux-pm@vger.kernel.org
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > On Thursday, December 5, 2019 9:53:20 AM CET Anson Huang wrote:
> > > > Hi, Rafael
> > > >     This issue is very weird, the irq_work used in
> > > > cpufreq_governor.c is
> > > very
> > > >     simple, ONLY one entry to claim the irq_work, and
> > > cpufreq_governor's irq_work
> > > >     is a private irq_work structure, no other drivers use it. I
> > > > added some
> > > trace
> > > >     event in cpufreq_governor.c and irq_work.c, every time, the
> > > > issue
> > > happened at
> > > >     the point of CPU1/2/3 all off, and CPU1 start ON line, but
> > > > when CPU1
> > > tried to
> > > >     sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work
> > > shows that
> > > >     previous work is pending on CPU3 which is offline, I also had
> > > > the
> > > trace event
> > > >     in irq_work_claim(),  but no any log shows the
> > > > cpufreq_governor
> > > irq_work is
> > > >     claimed on CPU3 after CPU3 offline, below is the debug patch I
> > > > added
> > > and the
> > > >     log on 2 consoles:
> > > >     If I understand it correctly, the irq work used in
> > > > cpufreq_governor
> > > ONLY has
> > > >     one entry of calling irq_work_queue() which will be ONLY
> > > > claimed on
> > > the CPU
> > > >     calling the irq_work_queue(), but from trace result, I have
> > > > NOT see
> > > where
> > > >     CPU3 could call irq_work_queue() after it finishes the irq
> > > > work sync
> > > before
> > > >     offline.
> > >
> > > Right.
> > >
> > > Which means that this particular irq_work only runs on the CPU that
> > > has run
> > > irq_work_queue() for it.
> > >
> > > >     Could it something wrong related to cache maintain during CPU
> > > hotplug?
> > >
> > > I'm not sure what is going on, but I do agree that it is weird
> > > enough. :-)
> > >
> > > [cut]
> > >
> > > > LOG on console 1 which does CPU1/2/3 offline and online stress test:
> > > > CPUHotplug: 4575 times remaining
> > > > [ 1047.401185] CPU1: shutdown
> > > > [ 1047.403917] psci: CPU1 killed.
> > > > [ 1047.449153] CPU2: shutdown
> > > > [ 1047.451880] psci: CPU2 killed.
> > > > [ 1047.501131] CPU3: shutdown
> > > > [ 1047.503857] psci: CPU3 killed.
> > > > [ 1048.541939] Detected VIPT I-cache on CPU1 [ 1048.541983] GICv3:
> > > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [
> > > > 1048.542050]
> > > > CPU1: Booted secondary processor 0x0000000001 [0x410fd042] [
> > > > 1048.585024] Detected VIPT I-cache on CPU2 [ 1048.585061] GICv3:
> CPU2:
> > > > found redistributor 2 region 0:0x0000000051b40000 [ 1048.585121]
> CPU2:
> > > > Booted secondary processor 0x0000000002 [0x410fd042] [
> > > > 1048.645070] Detected VIPT I-cache on CPU3 [ 1048.645112] GICv3:
> > > > CPU3: found redistributor 3 region 0:0x0000000051b60000
> [ 1048.645181] CPU3:
> > > > Booted secondary processor 0x0000000003 [0x410fd042]
> > > > CPUHotplug: 4574 times remaining
> > > > [ 1049.769187] CPU1: shutdown
> > > > [ 1049.771913] psci: CPU1 killed.
> > > > [ 1049.809126] CPU2: shutdown
> > > > [ 1049.811856] psci: CPU2 killed.
> > > > [ 1049.853135] CPU3: shutdown
> > > > [ 1049.855868] psci: CPU3 killed.
> > > >
> > > > Waiting here forever.....
> > > >
> > > > LOG on console 2 which enables the trace events I added upper:
> > > >              sed-4591  [003] d..4  1049.705561:
> cpu_frequency_irq_claim:
> > > cpu_id=3, flag=3
> > > >              sed-4591  [003] dNh1  1049.705604:
> > > > cpu_frequency_irq_run_list: cpu_id=3, flag=0
> > >
> > > So here CPU3 runs an IRQ work, presumably the cpufreq governor's one.
> > >
> > > After that its raised_list should be empty and it doesn't claim any
> > > IRQ works going forward.
> > >
> > > >           <idle>-0     [001] d.s2  1049.716308:
> cpu_frequency_irq_work:
> > > cpu_id=1, cpu=-1
> > > >           <idle>-0     [001] d.s2  1049.716319:
> cpu_frequency_irq_claim:
> > > cpu_id=1, flag=3
> > > >           <idle>-0     [001] dNH2  1049.716338:
> cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=0
> > >
> > > And now CPU1 runs the cpufreq governor IRQ work, so it sets
> > > work->cpu to 1 and then to -1 (when flushing raised_list).
> > >
> > > >           <idle>-0     [002] d.s2  1049.728303:
> cpu_frequency_irq_work:
> > > cpu_id=2, cpu=-1
> > > >           <idle>-0     [002] d.s2  1049.728307:
> cpu_frequency_irq_claim:
> > > cpu_id=2, flag=3
> > > >           <idle>-0     [002] dNH2  1049.728320:
> cpu_frequency_irq_run_list:
> > > cpu_id=2, flag=0
> > > >           <idle>-0     [001] d.s2  1049.740305:
> cpu_frequency_irq_work:
> > > cpu_id=1, cpu=-1
> > > >           <idle>-0     [001] d.s2  1049.740307:
> cpu_frequency_irq_claim:
> > > cpu_id=1, flag=3
> > > >           <idle>-0     [001] dNH2  1049.740319:
> cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=0
> > > >           <idle>-0     [001] d.s2  1049.752305:
> cpu_frequency_irq_work:
> > > cpu_id=1, cpu=-1
> > > >           <idle>-0     [001] d.s2  1049.752307:
> cpu_frequency_irq_claim:
> > > cpu_id=1, flag=3
> > > >           <idle>-0     [001] dNH2  1049.752316:
> cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=0
> > > >          cpuhp/1-13    [001] ....  1049.768340:
> cpu_frequency_irq_work_sync:
> > > cpu_id=1, cpu=-1, flag=0
> > > >          cpuhp/1-13    [001] d..4  1049.768681:
> cpu_frequency_irq_work:
> > > cpu_id=1, cpu=-1
> > > >          cpuhp/1-13    [001] d..4  1049.768683:
> cpu_frequency_irq_claim:
> > > cpu_id=1, flag=3
> > > >          cpuhp/1-13    [001] dNh1  1049.768698:
> cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=0
> > > >      smp_test.sh-734   [000] ...1  1049.771903:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh1  1049.775009:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] ...1  1049.776084:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh.  1049.776392:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] d..2  1049.779093:
> cpu_frequency_irq_work:
> > > cpu_id=0, cpu=-1
> > > >      smp_test.sh-734   [000] d..2  1049.779103:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=3
> > > >           <idle>-0     [000] dNh2  1049.779162:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=0
> > > >           <idle>-0     [000] d.s2  1049.792305:
> cpu_frequency_irq_work:
> > > cpu_id=0, cpu=-1
> > > >           <idle>-0     [000] d.s2  1049.792315:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=3
> > > >           <idle>-0     [000] dNH2  1049.792329:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=0
> > > >          cpuhp/2-18    [002] ....  1049.808315:
> cpu_frequency_irq_work_sync:
> > > cpu_id=2, cpu=-1, flag=0
> > > >          cpuhp/2-18    [002] d..4  1049.808642:
> cpu_frequency_irq_work:
> > > cpu_id=2, cpu=-1
> > > >          cpuhp/2-18    [002] d..4  1049.808645:
> cpu_frequency_irq_claim:
> > > cpu_id=2, flag=3
> > > >          cpuhp/2-18    [002] dNh1  1049.808658:
> cpu_frequency_irq_run_list:
> > > cpu_id=2, flag=0
> > > >      smp_test.sh-734   [000] ...1  1049.811848:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh1  1049.814949:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] ...1  1049.815988:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh1  1049.816321:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] d..3  1049.818936:
> cpu_frequency_irq_work:
> > > cpu_id=0, cpu=-1
> > > >      smp_test.sh-734   [000] d..3  1049.818946:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=3
> > > >      smp_test.sh-734   [000] dNh2  1049.818973:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=0
> > > >           <idle>-0     [000] d.s4  1049.832308:
> cpu_frequency_irq_work:
> > > cpu_id=0, cpu=-1
> > > >           <idle>-0     [000] d.s4  1049.832317:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=3
> > > >           <idle>-0     [000] dNH3  1049.832332:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=0
> > > >          cpuhp/3-23    [003] ....  1049.852314:
> cpu_frequency_irq_work_sync:
> > > cpu_id=3, cpu=-1, flag=0
> > > >
> > > > [Anson] when CPU3 offline, the irq work sync is successfully, no
> > > > irq work pending any more;
> > > >
> > > >      smp_test.sh-734   [000] ...1  1049.855859:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh1  1049.858958:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] ...1  1049.859990:
> cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh.  1049.860346:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >           <idle>-0     [001] d.h1  1050.896329:
> cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=4
> > > >          cpuhp/1-13    [001] ....  1050.916319:
> cpu_frequency_irq_work_sync:
> > > cpu_id=1, cpu=3, flag=3
> > > >
> > > > [Anson] we can see when CPU1 start online and tried to sync irq
> > > > work, found it is pending on CPU3 which is offline, and in this
> > > > period, no irq work claimed by cpufreq_governor,
> > >
> > > So I'm wondering how it is possible at all that work->cpu value is 3
> > > at this point.
> > >
> > > The last CPU that wrote to work->cpu was CPU0 and the written value
> > > was -1, and
> > > CPU3 saw that value when it was running irq_work_sync().
> > >
> > > There is no sane way by which work->cpu can be equal to 3 from
> > > CPU1's perspective, because the last value written to it by CPU1
> > > itself was -1 and the last value written to it by any other CPU also was -1.
> > >
> > > Moreover, after CPU3 had updated it last time (and the last value
> > > written to it by CPU3 had been -1), other CPUs, *including* CPU1,
> > > updated it too (and that for multiple times).
> > >
> > > So the only theory that can explain why CPU1 sees 3 in there when it
> > > is going online appears to be some silent memory corruption.
> > >
> > > That said, have you tried to make the READ_ONCE() change suggested a
> > > while ago?
> >
> > Below patch does NOT work using READ_ONCE() if I did the change
> correctly:
> 
> OK
> 
> > @@ -212,7 +208,7 @@ void irq_work_sync(struct irq_work *work)  {
> >         lockdep_assert_irqs_enabled();
> >
> > -       while (work->flags & IRQ_WORK_BUSY)
> > +       while (READ_ONCE(work->flags) & IRQ_WORK_BUSY)
> 
> You also may try using test_bit() instead of the raw read, but anyway at this
> point I would start talking to the arch/HW people if I were you.
> 
> >                 cpu_relax();
> >  }
> >
> > LOG:
> > CPUHotplug: 4937 times remaining
> > [  214.837047] CPU1: shutdown
> > [  214.839781] psci: CPU1 killed.
> > [  214.877041] CPU2: shutdown
> > [  214.879767] psci: CPU2 killed.
> > [  214.917026] CPU3: shutdown
> > [  214.919758] psci: CPU3 killed.
> > [  215.957816] Detected VIPT I-cache on CPU1 [  215.957860] GICv3:
> > CPU1: found redistributor 1 region 0:0x0000000051b20000 [  215.957930]
> > CPU1: Booted secondary processor 0x0000000001 [0x410fd042] [
> > 216.001025] Detected VIPT I-cache on CPU2 [  216.001064] GICv3: CPU2:
> > found redistributor 2 region 0:0x0000000051b40000 [  216.001126] CPU2:
> > Booted secondary processor 0x0000000002 [0x410fd042] [  216.068960]
> > Detected VIPT I-cache on CPU3 [  216.069004] GICv3: CPU3: found
> > redistributor 3 region 0:0x0000000051b60000 [  216.069076] CPU3:
> > Booted secondary processor 0x0000000003 [0x410fd042]
> > CPUHotplug: 4936 times remaining
> > [  217.201055] CPU1: shutdown
> > [  217.203779] psci: CPU1 killed.
> > [  400.506869] audit: type=1006 audit(1573738201.312:3): pid=1332
> > uid=0 old-aui1 [ 4000.600430] audit: type=1006
> > audit(1573741801.408:4): pid=1352 uid=0 old-aui1 [ 7600.687496] audit:
> > type=1006 audit(1573745401.492:5): pid=1371 uid=0 old-aui1
> >
> >
> >          cpuhp/1-13    [001] ....   217.200231:
> cpu_frequency_irq_work_sync: cpu_id=1, cpu=-1, flag=0
> >      smp_test.sh-741   [002] ...1   217.203770:
> cpu_frequency_irq_claim: cpu_id=2, flag=7
> >      smp_test.sh-741   [002] d.h1   217.206873:
> cpu_frequency_irq_run_list: cpu_id=2, flag=4
> >      smp_test.sh-741   [002] ...1   217.206893:
> cpu_frequency_irq_claim: cpu_id=2, flag=7
> >      smp_test.sh-741   [002] dNh.   217.208222:
> cpu_frequency_irq_run_list: cpu_id=2, flag=4
> >          cpuhp/2-18    [002] ....   217.248206:
> cpu_frequency_irq_work_sync: cpu_id=2, cpu=1, flag=3
> >
> > [Anson] this time, the irq work is pending on CPU1 which is offline.
> >
> >          kauditd-31    [000] ...1   400.519304:
> cpu_frequency_irq_claim: cpu_id=0, flag=7
> >           <idle>-0     [000] dNh1   400.520231:
> cpu_frequency_irq_run_list: cpu_id=0, flag=4
> >          kauditd-31    [003] ...1  4000.612845:
> cpu_frequency_irq_claim: cpu_id=3, flag=7
> >            crond-1352  [003] d.h.  4000.616221:
> cpu_frequency_irq_run_list: cpu_id=3, flag=4
> >          kauditd-31    [000] ...1  7600.699988:
> cpu_frequency_irq_claim: cpu_id=0, flag=7
> >           <idle>-0     [000] dNh1  7600.700205:
> cpu_frequency_irq_run_list: cpu_id=0, flag=4
> > root@imx8qxpmek:~#
> >
> >

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-05 15:52                           ` Rafael J. Wysocki
  2019-12-09 10:31                             ` Peng Fan
@ 2019-12-09 10:37                             ` Anson Huang
  2019-12-09 10:56                               ` Anson Huang
  1 sibling, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-12-09 10:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peng Fan, Viresh Kumar, Jacky Bai, linux-pm

Hi, Rafael/Viresh
	We noticed some different behaviors on v5.4 and v4.19, on v4.19, dbs_update_util_handler() looks like ONLY run on the CPU whose frequency needs to be updated (never on the CPU which is being offline), but on v5.4, we found in most cases, dbs_update_util_handler() will be assigned to the CPU being offline, and the cpufreq_this_cpu_can_update() can NOT prevent this scenario, because " policy->dvfs_possible_from_any_cpu " is always TRUE for drivers using -dt driver, and also, policy-cpus ONLY be updated after cpufreq_dbs_governor_stop() finished, that means after irq_work_sync() is called on the CPU being offline, there are still dbs_update_util_handler() assigned to that CPU being offline but NOT power down yet, so the previous irq_work_sync() does NOT make sense enough.
	That will cause an issue window of irq work being queued to the CPU being offline, but maybe that CPU will stop execution at anytime before the irq work is finished, then issue happened.
	Could this behavior caused by something changed in kernel/sched/*.c ?

273 static void dbs_update_util_handler(struct update_util_data *data, u64 time,
274                                     unsigned int flags)
275 {
276         struct cpu_dbs_info *cdbs = container_of(data, struct cpu_dbs_info, update_util);
277         struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
278         u64 delta_ns, lst;
279
280         if (!cpufreq_this_cpu_can_update(policy_dbs->policy))
281                 return;


Anson

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Thu, Dec 5, 2019 at 2:18 PM Anson Huang <anson.huang@nxp.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Sent: Thursday, December 5, 2019 6:48 PM
> > > To: Anson Huang <anson.huang@nxp.com>
> > > Cc: Viresh Kumar <viresh.kumar@linaro.org>; Jacky Bai
> > > <ping.bai@nxp.com>; linux-pm@vger.kernel.org
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > On Thursday, December 5, 2019 9:53:20 AM CET Anson Huang wrote:
> > > > Hi, Rafael
> > > >     This issue is very weird, the irq_work used in
> > > > cpufreq_governor.c is
> > > very
> > > >     simple, ONLY one entry to claim the irq_work, and
> > > cpufreq_governor's irq_work
> > > >     is a private irq_work structure, no other drivers use it. I
> > > > added some
> > > trace
> > > >     event in cpufreq_governor.c and irq_work.c, every time, the
> > > > issue
> > > happened at
> > > >     the point of CPU1/2/3 all off, and CPU1 start ON line, but
> > > > when CPU1
> > > tried to
> > > >     sync the irq_work in cpufreq_dbs_governor_stop(), the irq_work
> > > shows that
> > > >     previous work is pending on CPU3 which is offline, I also had
> > > > the
> > > trace event
> > > >     in irq_work_claim(),  but no any log shows the
> > > > cpufreq_governor
> > > irq_work is
> > > >     claimed on CPU3 after CPU3 offline, below is the debug patch I
> > > > added
> > > and the
> > > >     log on 2 consoles:
> > > >     If I understand it correctly, the irq work used in
> > > > cpufreq_governor
> > > ONLY has
> > > >     one entry of calling irq_work_queue() which will be ONLY
> > > > claimed on
> > > the CPU
> > > >     calling the irq_work_queue(), but from trace result, I have
> > > > NOT see
> > > where
> > > >     CPU3 could call irq_work_queue() after it finishes the irq
> > > > work sync
> > > before
> > > >     offline.
> > >
> > > Right.
> > >
> > > Which means that this particular irq_work only runs on the CPU that
> > > has run
> > > irq_work_queue() for it.
> > >
> > > >     Could it something wrong related to cache maintain during CPU
> > > hotplug?
> > >
> > > I'm not sure what is going on, but I do agree that it is weird
> > > enough. :-)
> > >
> > > [cut]
> > >
> > > > LOG on console 1 which does CPU1/2/3 offline and online stress test:
> > > > CPUHotplug: 4575 times remaining
> > > > [ 1047.401185] CPU1: shutdown
> > > > [ 1047.403917] psci: CPU1 killed.
> > > > [ 1047.449153] CPU2: shutdown
> > > > [ 1047.451880] psci: CPU2 killed.
> > > > [ 1047.501131] CPU3: shutdown
> > > > [ 1047.503857] psci: CPU3 killed.
> > > > [ 1048.541939] Detected VIPT I-cache on CPU1 [ 1048.541983] GICv3:
> > > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [
> > > > 1048.542050]
> > > > CPU1: Booted secondary processor 0x0000000001 [0x410fd042] [
> > > > 1048.585024] Detected VIPT I-cache on CPU2 [ 1048.585061] GICv3:
> CPU2:
> > > > found redistributor 2 region 0:0x0000000051b40000 [ 1048.585121]
> CPU2:
> > > > Booted secondary processor 0x0000000002 [0x410fd042] [
> > > > 1048.645070] Detected VIPT I-cache on CPU3 [ 1048.645112] GICv3:
> > > > CPU3: found redistributor 3 region 0:0x0000000051b60000
> [ 1048.645181] CPU3:
> > > > Booted secondary processor 0x0000000003 [0x410fd042]
> > > > CPUHotplug: 4574 times remaining
> > > > [ 1049.769187] CPU1: shutdown
> > > > [ 1049.771913] psci: CPU1 killed.
> > > > [ 1049.809126] CPU2: shutdown
> > > > [ 1049.811856] psci: CPU2 killed.
> > > > [ 1049.853135] CPU3: shutdown
> > > > [ 1049.855868] psci: CPU3 killed.
> > > >
> > > > Waiting here forever.....
> > > >
> > > > LOG on console 2 which enables the trace events I added upper:
> > > >              sed-4591  [003] d..4  1049.705561: cpu_frequency_irq_claim:
> > > cpu_id=3, flag=3
> > > >              sed-4591  [003] dNh1  1049.705604:
> > > > cpu_frequency_irq_run_list: cpu_id=3, flag=0
> > >
> > > So here CPU3 runs an IRQ work, presumably the cpufreq governor's one.
> > >
> > > After that its raised_list should be empty and it doesn't claim any
> > > IRQ works going forward.
> > >
> > > >           <idle>-0     [001] d.s2  1049.716308: cpu_frequency_irq_work:
> > > cpu_id=1, cpu=-1
> > > >           <idle>-0     [001] d.s2  1049.716319: cpu_frequency_irq_claim:
> > > cpu_id=1, flag=3
> > > >           <idle>-0     [001] dNH2  1049.716338: cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=0
> > >
> > > And now CPU1 runs the cpufreq governor IRQ work, so it sets
> > > work->cpu to 1 and then to -1 (when flushing raised_list).
> > >
> > > >           <idle>-0     [002] d.s2  1049.728303: cpu_frequency_irq_work:
> > > cpu_id=2, cpu=-1
> > > >           <idle>-0     [002] d.s2  1049.728307: cpu_frequency_irq_claim:
> > > cpu_id=2, flag=3
> > > >           <idle>-0     [002] dNH2  1049.728320: cpu_frequency_irq_run_list:
> > > cpu_id=2, flag=0
> > > >           <idle>-0     [001] d.s2  1049.740305: cpu_frequency_irq_work:
> > > cpu_id=1, cpu=-1
> > > >           <idle>-0     [001] d.s2  1049.740307: cpu_frequency_irq_claim:
> > > cpu_id=1, flag=3
> > > >           <idle>-0     [001] dNH2  1049.740319: cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=0
> > > >           <idle>-0     [001] d.s2  1049.752305: cpu_frequency_irq_work:
> > > cpu_id=1, cpu=-1
> > > >           <idle>-0     [001] d.s2  1049.752307: cpu_frequency_irq_claim:
> > > cpu_id=1, flag=3
> > > >           <idle>-0     [001] dNH2  1049.752316: cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=0
> > > >          cpuhp/1-13    [001] ....  1049.768340:
> cpu_frequency_irq_work_sync:
> > > cpu_id=1, cpu=-1, flag=0
> > > >          cpuhp/1-13    [001] d..4  1049.768681: cpu_frequency_irq_work:
> > > cpu_id=1, cpu=-1
> > > >          cpuhp/1-13    [001] d..4  1049.768683: cpu_frequency_irq_claim:
> > > cpu_id=1, flag=3
> > > >          cpuhp/1-13    [001] dNh1  1049.768698:
> cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=0
> > > >      smp_test.sh-734   [000] ...1  1049.771903: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh1  1049.775009:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] ...1  1049.776084: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh.  1049.776392:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] d..2  1049.779093: cpu_frequency_irq_work:
> > > cpu_id=0, cpu=-1
> > > >      smp_test.sh-734   [000] d..2  1049.779103: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=3
> > > >           <idle>-0     [000] dNh2  1049.779162: cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=0
> > > >           <idle>-0     [000] d.s2  1049.792305: cpu_frequency_irq_work:
> > > cpu_id=0, cpu=-1
> > > >           <idle>-0     [000] d.s2  1049.792315: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=3
> > > >           <idle>-0     [000] dNH2  1049.792329: cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=0
> > > >          cpuhp/2-18    [002] ....  1049.808315:
> cpu_frequency_irq_work_sync:
> > > cpu_id=2, cpu=-1, flag=0
> > > >          cpuhp/2-18    [002] d..4  1049.808642: cpu_frequency_irq_work:
> > > cpu_id=2, cpu=-1
> > > >          cpuhp/2-18    [002] d..4  1049.808645: cpu_frequency_irq_claim:
> > > cpu_id=2, flag=3
> > > >          cpuhp/2-18    [002] dNh1  1049.808658:
> cpu_frequency_irq_run_list:
> > > cpu_id=2, flag=0
> > > >      smp_test.sh-734   [000] ...1  1049.811848: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh1  1049.814949:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] ...1  1049.815988: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh1  1049.816321:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] d..3  1049.818936: cpu_frequency_irq_work:
> > > cpu_id=0, cpu=-1
> > > >      smp_test.sh-734   [000] d..3  1049.818946: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=3
> > > >      smp_test.sh-734   [000] dNh2  1049.818973:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=0
> > > >           <idle>-0     [000] d.s4  1049.832308: cpu_frequency_irq_work:
> > > cpu_id=0, cpu=-1
> > > >           <idle>-0     [000] d.s4  1049.832317: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=3
> > > >           <idle>-0     [000] dNH3  1049.832332: cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=0
> > > >          cpuhp/3-23    [003] ....  1049.852314:
> cpu_frequency_irq_work_sync:
> > > cpu_id=3, cpu=-1, flag=0
> > > >
> > > > [Anson] when CPU3 offline, the irq work sync is successfully, no
> > > > irq work pending any more;
> > > >
> > > >      smp_test.sh-734   [000] ...1  1049.855859: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh1  1049.858958:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >      smp_test.sh-734   [000] ...1  1049.859990: cpu_frequency_irq_claim:
> > > cpu_id=0, flag=7
> > > >      smp_test.sh-734   [000] dNh.  1049.860346:
> cpu_frequency_irq_run_list:
> > > cpu_id=0, flag=4
> > > >           <idle>-0     [001] d.h1  1050.896329: cpu_frequency_irq_run_list:
> > > cpu_id=1, flag=4
> > > >          cpuhp/1-13    [001] ....  1050.916319:
> cpu_frequency_irq_work_sync:
> > > cpu_id=1, cpu=3, flag=3
> > > >
> > > > [Anson] we can see when CPU1 start online and tried to sync irq
> > > > work, found it is pending on CPU3 which is offline, and in this
> > > > period, no irq work claimed by cpufreq_governor,
> > >
> > > So I'm wondering how it is possible at all that work->cpu value is 3
> > > at this point.
> > >
> > > The last CPU that wrote to work->cpu was CPU0 and the written value
> > > was -1, and
> > > CPU3 saw that value when it was running irq_work_sync().
> > >
> > > There is no sane way by which work->cpu can be equal to 3 from
> > > CPU1's perspective, because the last value written to it by CPU1
> > > itself was -1 and the last value written to it by any other CPU also was -1.
> > >
> > > Moreover, after CPU3 had updated it last time (and the last value
> > > written to it by CPU3 had been -1), other CPUs, *including* CPU1,
> > > updated it too (and that for multiple times).
> > >
> > > So the only theory that can explain why CPU1 sees 3 in there when it
> > > is going online appears to be some silent memory corruption.
> > >
> > > That said, have you tried to make the READ_ONCE() change suggested a
> > > while ago?
> >
> > Below patch does NOT work using READ_ONCE() if I did the change
> correctly:
> 
> OK
> 
> > @@ -212,7 +208,7 @@ void irq_work_sync(struct irq_work *work)  {
> >         lockdep_assert_irqs_enabled();
> >
> > -       while (work->flags & IRQ_WORK_BUSY)
> > +       while (READ_ONCE(work->flags) & IRQ_WORK_BUSY)
> 
> You also may try using test_bit() instead of the raw read, but anyway at this
> point I would start talking to the arch/HW people if I were you.
> 
> >                 cpu_relax();
> >  }
> >
> > LOG:
> > CPUHotplug: 4937 times remaining
> > [  214.837047] CPU1: shutdown
> > [  214.839781] psci: CPU1 killed.
> > [  214.877041] CPU2: shutdown
> > [  214.879767] psci: CPU2 killed.
> > [  214.917026] CPU3: shutdown
> > [  214.919758] psci: CPU3 killed.
> > [  215.957816] Detected VIPT I-cache on CPU1 [  215.957860] GICv3:
> > CPU1: found redistributor 1 region 0:0x0000000051b20000 [  215.957930]
> > CPU1: Booted secondary processor 0x0000000001 [0x410fd042] [
> > 216.001025] Detected VIPT I-cache on CPU2 [  216.001064] GICv3: CPU2:
> > found redistributor 2 region 0:0x0000000051b40000 [  216.001126] CPU2:
> > Booted secondary processor 0x0000000002 [0x410fd042] [  216.068960]
> > Detected VIPT I-cache on CPU3 [  216.069004] GICv3: CPU3: found
> > redistributor 3 region 0:0x0000000051b60000 [  216.069076] CPU3:
> > Booted secondary processor 0x0000000003 [0x410fd042]
> > CPUHotplug: 4936 times remaining
> > [  217.201055] CPU1: shutdown
> > [  217.203779] psci: CPU1 killed.
> > [  400.506869] audit: type=1006 audit(1573738201.312:3): pid=1332
> > uid=0 old-aui1 [ 4000.600430] audit: type=1006
> > audit(1573741801.408:4): pid=1352 uid=0 old-aui1 [ 7600.687496] audit:
> > type=1006 audit(1573745401.492:5): pid=1371 uid=0 old-aui1
> >
> >
> >          cpuhp/1-13    [001] ....   217.200231: cpu_frequency_irq_work_sync:
> cpu_id=1, cpu=-1, flag=0
> >      smp_test.sh-741   [002] ...1   217.203770: cpu_frequency_irq_claim:
> cpu_id=2, flag=7
> >      smp_test.sh-741   [002] d.h1   217.206873: cpu_frequency_irq_run_list:
> cpu_id=2, flag=4
> >      smp_test.sh-741   [002] ...1   217.206893: cpu_frequency_irq_claim:
> cpu_id=2, flag=7
> >      smp_test.sh-741   [002] dNh.   217.208222: cpu_frequency_irq_run_list:
> cpu_id=2, flag=4
> >          cpuhp/2-18    [002] ....   217.248206: cpu_frequency_irq_work_sync:
> cpu_id=2, cpu=1, flag=3
> >
> > [Anson] this time, the irq work is pending on CPU1 which is offline.
> >
> >          kauditd-31    [000] ...1   400.519304: cpu_frequency_irq_claim:
> cpu_id=0, flag=7
> >           <idle>-0     [000] dNh1   400.520231: cpu_frequency_irq_run_list:
> cpu_id=0, flag=4
> >          kauditd-31    [003] ...1  4000.612845: cpu_frequency_irq_claim:
> cpu_id=3, flag=7
> >            crond-1352  [003] d.h.  4000.616221: cpu_frequency_irq_run_list:
> cpu_id=3, flag=4
> >          kauditd-31    [000] ...1  7600.699988: cpu_frequency_irq_claim:
> cpu_id=0, flag=7
> >           <idle>-0     [000] dNh1  7600.700205: cpu_frequency_irq_run_list:
> cpu_id=0, flag=4
> > root@imx8qxpmek:~#
> >
> >

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-09 10:37                             ` Anson Huang
@ 2019-12-09 10:56                               ` Anson Huang
  2019-12-09 11:23                                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-12-09 10:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peng Fan, Viresh Kumar, Jacky Bai, linux-pm

Forgot to mentioned that below patch on v5.4 can easily reproduce the panic() on our platforms which I think is unexpected, as the policy->cpus already be updated after governor stop, but still try to have irq work queued on it.

static void dbs_update_util_handler(struct update_util_data *data, u64 time, unsigned int flags)
+       if (!cpumask_test_cpu(smp_processor_id(), policy_dbs->policy->cpus))
+               panic("...irq work on offline cpu %d\n", smp_processor_id());
        irq_work_queue(&policy_dbs->irq_work);

Anson

> -----Original Message-----
> From: Anson Huang
> Sent: Monday, December 9, 2019 6:38 PM
> To: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>; Peng Fan <peng.fan@nxp.com>;
> Viresh Kumar <viresh.kumar@linaro.org>; Jacky Bai <ping.bai@nxp.com>;
> linux-pm@vger.kernel.org
> Subject: RE: About CPU hot-plug stress test failed in cpufreq driver
> 
> Hi, Rafael/Viresh
> 	We noticed some different behaviors on v5.4 and v4.19, on v4.19,
> dbs_update_util_handler() looks like ONLY run on the CPU whose frequency
> needs to be updated (never on the CPU which is being offline), but on v5.4,
> we found in most cases, dbs_update_util_handler() will be assigned to the
> CPU being offline, and the cpufreq_this_cpu_can_update() can NOT prevent
> this scenario, because " policy->dvfs_possible_from_any_cpu " is always
> TRUE for drivers using -dt driver, and also, policy-cpus ONLY be updated after
> cpufreq_dbs_governor_stop() finished, that means after irq_work_sync() is
> called on the CPU being offline, there are still dbs_update_util_handler()
> assigned to that CPU being offline but NOT power down yet, so the previous
> irq_work_sync() does NOT make sense enough.
> 	That will cause an issue window of irq work being queued to the CPU
> being offline, but maybe that CPU will stop execution at anytime before the
> irq work is finished, then issue happened.
> 	Could this behavior caused by something changed in
> kernel/sched/*.c ?
> 
> 273 static void dbs_update_util_handler(struct update_util_data *data, u64
> time,
> 274                                     unsigned int flags)
> 275 {
> 276         struct cpu_dbs_info *cdbs = container_of(data, struct cpu_dbs_info,
> update_util);
> 277         struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
> 278         u64 delta_ns, lst;
> 279
> 280         if (!cpufreq_this_cpu_can_update(policy_dbs->policy))
> 281                 return;
> 
> 
> Anson
> 
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> >
> > On Thu, Dec 5, 2019 at 2:18 PM Anson Huang <anson.huang@nxp.com>
> > wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > Sent: Thursday, December 5, 2019 6:48 PM
> > > > To: Anson Huang <anson.huang@nxp.com>
> > > > Cc: Viresh Kumar <viresh.kumar@linaro.org>; Jacky Bai
> > > > <ping.bai@nxp.com>; linux-pm@vger.kernel.org
> > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > driver
> > > >
> > > > On Thursday, December 5, 2019 9:53:20 AM CET Anson Huang wrote:
> > > > > Hi, Rafael
> > > > >     This issue is very weird, the irq_work used in
> > > > > cpufreq_governor.c is
> > > > very
> > > > >     simple, ONLY one entry to claim the irq_work, and
> > > > cpufreq_governor's irq_work
> > > > >     is a private irq_work structure, no other drivers use it. I
> > > > > added some
> > > > trace
> > > > >     event in cpufreq_governor.c and irq_work.c, every time, the
> > > > > issue
> > > > happened at
> > > > >     the point of CPU1/2/3 all off, and CPU1 start ON line, but
> > > > > when CPU1
> > > > tried to
> > > > >     sync the irq_work in cpufreq_dbs_governor_stop(), the
> > > > > irq_work
> > > > shows that
> > > > >     previous work is pending on CPU3 which is offline, I also
> > > > > had the
> > > > trace event
> > > > >     in irq_work_claim(),  but no any log shows the
> > > > > cpufreq_governor
> > > > irq_work is
> > > > >     claimed on CPU3 after CPU3 offline, below is the debug patch
> > > > > I added
> > > > and the
> > > > >     log on 2 consoles:
> > > > >     If I understand it correctly, the irq work used in
> > > > > cpufreq_governor
> > > > ONLY has
> > > > >     one entry of calling irq_work_queue() which will be ONLY
> > > > > claimed on
> > > > the CPU
> > > > >     calling the irq_work_queue(), but from trace result, I have
> > > > > NOT see
> > > > where
> > > > >     CPU3 could call irq_work_queue() after it finishes the irq
> > > > > work sync
> > > > before
> > > > >     offline.
> > > >
> > > > Right.
> > > >
> > > > Which means that this particular irq_work only runs on the CPU
> > > > that has run
> > > > irq_work_queue() for it.
> > > >
> > > > >     Could it something wrong related to cache maintain during
> > > > > CPU
> > > > hotplug?
> > > >
> > > > I'm not sure what is going on, but I do agree that it is weird
> > > > enough. :-)
> > > >
> > > > [cut]
> > > >
> > > > > LOG on console 1 which does CPU1/2/3 offline and online stress test:
> > > > > CPUHotplug: 4575 times remaining [ 1047.401185] CPU1: shutdown [
> > > > > 1047.403917] psci: CPU1 killed.
> > > > > [ 1047.449153] CPU2: shutdown
> > > > > [ 1047.451880] psci: CPU2 killed.
> > > > > [ 1047.501131] CPU3: shutdown
> > > > > [ 1047.503857] psci: CPU3 killed.
> > > > > [ 1048.541939] Detected VIPT I-cache on CPU1 [ 1048.541983] GICv3:
> > > > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [
> > > > > 1048.542050]
> > > > > CPU1: Booted secondary processor 0x0000000001 [0x410fd042] [
> > > > > 1048.585024] Detected VIPT I-cache on CPU2 [ 1048.585061] GICv3:
> > CPU2:
> > > > > found redistributor 2 region 0:0x0000000051b40000 [ 1048.585121]
> > CPU2:
> > > > > Booted secondary processor 0x0000000002 [0x410fd042] [
> > > > > 1048.645070] Detected VIPT I-cache on CPU3 [ 1048.645112] GICv3:
> > > > > CPU3: found redistributor 3 region 0:0x0000000051b60000
> > [ 1048.645181] CPU3:
> > > > > Booted secondary processor 0x0000000003 [0x410fd042]
> > > > > CPUHotplug: 4574 times remaining [ 1049.769187] CPU1: shutdown [
> > > > > 1049.771913] psci: CPU1 killed.
> > > > > [ 1049.809126] CPU2: shutdown
> > > > > [ 1049.811856] psci: CPU2 killed.
> > > > > [ 1049.853135] CPU3: shutdown
> > > > > [ 1049.855868] psci: CPU3 killed.
> > > > >
> > > > > Waiting here forever.....
> > > > >
> > > > > LOG on console 2 which enables the trace events I added upper:
> > > > >              sed-4591  [003] d..4  1049.705561: cpu_frequency_irq_claim:
> > > > cpu_id=3, flag=3
> > > > >              sed-4591  [003] dNh1  1049.705604:
> > > > > cpu_frequency_irq_run_list: cpu_id=3, flag=0
> > > >
> > > > So here CPU3 runs an IRQ work, presumably the cpufreq governor's one.
> > > >
> > > > After that its raised_list should be empty and it doesn't claim
> > > > any IRQ works going forward.
> > > >
> > > > >           <idle>-0     [001] d.s2  1049.716308: cpu_frequency_irq_work:
> > > > cpu_id=1, cpu=-1
> > > > >           <idle>-0     [001] d.s2  1049.716319: cpu_frequency_irq_claim:
> > > > cpu_id=1, flag=3
> > > > >           <idle>-0     [001] dNH2  1049.716338: cpu_frequency_irq_run_list:
> > > > cpu_id=1, flag=0
> > > >
> > > > And now CPU1 runs the cpufreq governor IRQ work, so it sets
> > > > work->cpu to 1 and then to -1 (when flushing raised_list).
> > > >
> > > > >           <idle>-0     [002] d.s2  1049.728303: cpu_frequency_irq_work:
> > > > cpu_id=2, cpu=-1
> > > > >           <idle>-0     [002] d.s2  1049.728307: cpu_frequency_irq_claim:
> > > > cpu_id=2, flag=3
> > > > >           <idle>-0     [002] dNH2  1049.728320: cpu_frequency_irq_run_list:
> > > > cpu_id=2, flag=0
> > > > >           <idle>-0     [001] d.s2  1049.740305: cpu_frequency_irq_work:
> > > > cpu_id=1, cpu=-1
> > > > >           <idle>-0     [001] d.s2  1049.740307: cpu_frequency_irq_claim:
> > > > cpu_id=1, flag=3
> > > > >           <idle>-0     [001] dNH2  1049.740319: cpu_frequency_irq_run_list:
> > > > cpu_id=1, flag=0
> > > > >           <idle>-0     [001] d.s2  1049.752305: cpu_frequency_irq_work:
> > > > cpu_id=1, cpu=-1
> > > > >           <idle>-0     [001] d.s2  1049.752307: cpu_frequency_irq_claim:
> > > > cpu_id=1, flag=3
> > > > >           <idle>-0     [001] dNH2  1049.752316: cpu_frequency_irq_run_list:
> > > > cpu_id=1, flag=0
> > > > >          cpuhp/1-13    [001] ....  1049.768340:
> > cpu_frequency_irq_work_sync:
> > > > cpu_id=1, cpu=-1, flag=0
> > > > >          cpuhp/1-13    [001] d..4  1049.768681: cpu_frequency_irq_work:
> > > > cpu_id=1, cpu=-1
> > > > >          cpuhp/1-13    [001] d..4  1049.768683: cpu_frequency_irq_claim:
> > > > cpu_id=1, flag=3
> > > > >          cpuhp/1-13    [001] dNh1  1049.768698:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=1, flag=0
> > > > >      smp_test.sh-734   [000] ...1  1049.771903: cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=7
> > > > >      smp_test.sh-734   [000] dNh1  1049.775009:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=4
> > > > >      smp_test.sh-734   [000] ...1  1049.776084: cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=7
> > > > >      smp_test.sh-734   [000] dNh.  1049.776392:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=4
> > > > >      smp_test.sh-734   [000] d..2  1049.779093:
> cpu_frequency_irq_work:
> > > > cpu_id=0, cpu=-1
> > > > >      smp_test.sh-734   [000] d..2  1049.779103:
> cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=3
> > > > >           <idle>-0     [000] dNh2  1049.779162: cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=0
> > > > >           <idle>-0     [000] d.s2  1049.792305: cpu_frequency_irq_work:
> > > > cpu_id=0, cpu=-1
> > > > >           <idle>-0     [000] d.s2  1049.792315: cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=3
> > > > >           <idle>-0     [000] dNH2  1049.792329: cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=0
> > > > >          cpuhp/2-18    [002] ....  1049.808315:
> > cpu_frequency_irq_work_sync:
> > > > cpu_id=2, cpu=-1, flag=0
> > > > >          cpuhp/2-18    [002] d..4  1049.808642: cpu_frequency_irq_work:
> > > > cpu_id=2, cpu=-1
> > > > >          cpuhp/2-18    [002] d..4  1049.808645: cpu_frequency_irq_claim:
> > > > cpu_id=2, flag=3
> > > > >          cpuhp/2-18    [002] dNh1  1049.808658:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=2, flag=0
> > > > >      smp_test.sh-734   [000] ...1  1049.811848: cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=7
> > > > >      smp_test.sh-734   [000] dNh1  1049.814949:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=4
> > > > >      smp_test.sh-734   [000] ...1  1049.815988: cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=7
> > > > >      smp_test.sh-734   [000] dNh1  1049.816321:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=4
> > > > >      smp_test.sh-734   [000] d..3  1049.818936:
> cpu_frequency_irq_work:
> > > > cpu_id=0, cpu=-1
> > > > >      smp_test.sh-734   [000] d..3  1049.818946:
> cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=3
> > > > >      smp_test.sh-734   [000] dNh2  1049.818973:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=0
> > > > >           <idle>-0     [000] d.s4  1049.832308: cpu_frequency_irq_work:
> > > > cpu_id=0, cpu=-1
> > > > >           <idle>-0     [000] d.s4  1049.832317: cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=3
> > > > >           <idle>-0     [000] dNH3  1049.832332: cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=0
> > > > >          cpuhp/3-23    [003] ....  1049.852314:
> > cpu_frequency_irq_work_sync:
> > > > cpu_id=3, cpu=-1, flag=0
> > > > >
> > > > > [Anson] when CPU3 offline, the irq work sync is successfully, no
> > > > > irq work pending any more;
> > > > >
> > > > >      smp_test.sh-734   [000] ...1  1049.855859: cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=7
> > > > >      smp_test.sh-734   [000] dNh1  1049.858958:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=4
> > > > >      smp_test.sh-734   [000] ...1  1049.859990: cpu_frequency_irq_claim:
> > > > cpu_id=0, flag=7
> > > > >      smp_test.sh-734   [000] dNh.  1049.860346:
> > cpu_frequency_irq_run_list:
> > > > cpu_id=0, flag=4
> > > > >           <idle>-0     [001] d.h1  1050.896329: cpu_frequency_irq_run_list:
> > > > cpu_id=1, flag=4
> > > > >          cpuhp/1-13    [001] ....  1050.916319:
> > cpu_frequency_irq_work_sync:
> > > > cpu_id=1, cpu=3, flag=3
> > > > >
> > > > > [Anson] we can see when CPU1 start online and tried to sync irq
> > > > > work, found it is pending on CPU3 which is offline, and in this
> > > > > period, no irq work claimed by cpufreq_governor,
> > > >
> > > > So I'm wondering how it is possible at all that work->cpu value is
> > > > 3 at this point.
> > > >
> > > > The last CPU that wrote to work->cpu was CPU0 and the written
> > > > value was -1, and
> > > > CPU3 saw that value when it was running irq_work_sync().
> > > >
> > > > There is no sane way by which work->cpu can be equal to 3 from
> > > > CPU1's perspective, because the last value written to it by CPU1
> > > > itself was -1 and the last value written to it by any other CPU also was -1.
> > > >
> > > > Moreover, after CPU3 had updated it last time (and the last value
> > > > written to it by CPU3 had been -1), other CPUs, *including* CPU1,
> > > > updated it too (and that for multiple times).
> > > >
> > > > So the only theory that can explain why CPU1 sees 3 in there when
> > > > it is going online appears to be some silent memory corruption.
> > > >
> > > > That said, have you tried to make the READ_ONCE() change suggested
> > > > a while ago?
> > >
> > > Below patch does NOT work using READ_ONCE() if I did the change
> > correctly:
> >
> > OK
> >
> > > @@ -212,7 +208,7 @@ void irq_work_sync(struct irq_work *work)  {
> > >         lockdep_assert_irqs_enabled();
> > >
> > > -       while (work->flags & IRQ_WORK_BUSY)
> > > +       while (READ_ONCE(work->flags) & IRQ_WORK_BUSY)
> >
> > You also may try using test_bit() instead of the raw read, but anyway
> > at this point I would start talking to the arch/HW people if I were you.
> >
> > >                 cpu_relax();
> > >  }
> > >
> > > LOG:
> > > CPUHotplug: 4937 times remaining
> > > [  214.837047] CPU1: shutdown
> > > [  214.839781] psci: CPU1 killed.
> > > [  214.877041] CPU2: shutdown
> > > [  214.879767] psci: CPU2 killed.
> > > [  214.917026] CPU3: shutdown
> > > [  214.919758] psci: CPU3 killed.
> > > [  215.957816] Detected VIPT I-cache on CPU1 [  215.957860] GICv3:
> > > CPU1: found redistributor 1 region 0:0x0000000051b20000 [
> > > 215.957930]
> > > CPU1: Booted secondary processor 0x0000000001 [0x410fd042] [
> > > 216.001025] Detected VIPT I-cache on CPU2 [  216.001064] GICv3: CPU2:
> > > found redistributor 2 region 0:0x0000000051b40000 [  216.001126] CPU2:
> > > Booted secondary processor 0x0000000002 [0x410fd042] [  216.068960]
> > > Detected VIPT I-cache on CPU3 [  216.069004] GICv3: CPU3: found
> > > redistributor 3 region 0:0x0000000051b60000 [  216.069076] CPU3:
> > > Booted secondary processor 0x0000000003 [0x410fd042]
> > > CPUHotplug: 4936 times remaining
> > > [  217.201055] CPU1: shutdown
> > > [  217.203779] psci: CPU1 killed.
> > > [  400.506869] audit: type=1006 audit(1573738201.312:3): pid=1332
> > > uid=0 old-aui1 [ 4000.600430] audit: type=1006
> > > audit(1573741801.408:4): pid=1352 uid=0 old-aui1 [ 7600.687496] audit:
> > > type=1006 audit(1573745401.492:5): pid=1371 uid=0 old-aui1
> > >
> > >
> > >          cpuhp/1-13    [001] ....   217.200231: cpu_frequency_irq_work_sync:
> > cpu_id=1, cpu=-1, flag=0
> > >      smp_test.sh-741   [002] ...1   217.203770: cpu_frequency_irq_claim:
> > cpu_id=2, flag=7
> > >      smp_test.sh-741   [002] d.h1   217.206873: cpu_frequency_irq_run_list:
> > cpu_id=2, flag=4
> > >      smp_test.sh-741   [002] ...1   217.206893: cpu_frequency_irq_claim:
> > cpu_id=2, flag=7
> > >      smp_test.sh-741   [002] dNh.   217.208222: cpu_frequency_irq_run_list:
> > cpu_id=2, flag=4
> > >          cpuhp/2-18    [002] ....   217.248206: cpu_frequency_irq_work_sync:
> > cpu_id=2, cpu=1, flag=3
> > >
> > > [Anson] this time, the irq work is pending on CPU1 which is offline.
> > >
> > >          kauditd-31    [000] ...1   400.519304: cpu_frequency_irq_claim:
> > cpu_id=0, flag=7
> > >           <idle>-0     [000] dNh1   400.520231: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=4
> > >          kauditd-31    [003] ...1  4000.612845: cpu_frequency_irq_claim:
> > cpu_id=3, flag=7
> > >            crond-1352  [003] d.h.  4000.616221: cpu_frequency_irq_run_list:
> > cpu_id=3, flag=4
> > >          kauditd-31    [000] ...1  7600.699988: cpu_frequency_irq_claim:
> > cpu_id=0, flag=7
> > >           <idle>-0     [000] dNh1  7600.700205: cpu_frequency_irq_run_list:
> > cpu_id=0, flag=4
> > > root@imx8qxpmek:~#
> > >
> > >

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-09 10:56                               ` Anson Huang
@ 2019-12-09 11:23                                 ` Rafael J. Wysocki
  2019-12-09 12:32                                   ` Anson Huang
  0 siblings, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-09 11:23 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peng Fan, Viresh Kumar,
	Jacky Bai, linux-pm

On Mon, Dec 9, 2019 at 11:57 AM Anson Huang <anson.huang@nxp.com> wrote:
>
> Forgot to mentioned that below patch on v5.4 can easily reproduce the panic() on our platforms which I think is unexpected, as the policy->cpus already be updated after governor stop, but still try to have irq work queued on it.
>
> static void dbs_update_util_handler(struct update_util_data *data, u64 time, unsigned int flags)
> +       if (!cpumask_test_cpu(smp_processor_id(), policy_dbs->policy->cpus))
> +               panic("...irq work on offline cpu %d\n", smp_processor_id());
>         irq_work_queue(&policy_dbs->irq_work);

Yes, that is unexpected.

In cpufreq_offline(), we have:

    down_write(&policy->rwsem);
    if (has_target())
        cpufreq_stop_governor(policy);

    cpumask_clear_cpu(cpu, policy->cpus);

and cpufreq_stop_governor() calls policy->governor->stop(policy) which
is cpufreq_dbs_governor_stop().

That calls gov_clear_update_util(policy_dbs->policy) first, which
invokes cpufreq_remove_update_util_hook() for each CPU in policy->cpus
and synchronizes RCU, so after that point none of the policy->cpus is
expected to run dbs_update_util_handler().

policy->cpus is updated next and the governor is started again with
the new policy->cpus.  Because the offline CPU is not there, it is not
expected to run dbs_update_util_handler() again.

Do you only get the original error when one of the CPUs goes back online?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-09 11:23                                 ` Rafael J. Wysocki
@ 2019-12-09 12:32                                   ` Anson Huang
  2019-12-09 12:44                                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-12-09 12:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peng Fan, Viresh Kumar, Jacky Bai, linux-pm



From Anson's iPhone 6


> 在 2019年12月9日,19:23,Rafael J. Wysocki <rafael@kernel.org> 写道:
> 
>> On Mon, Dec 9, 2019 at 11:57 AM Anson Huang <anson.huang@nxp.com> wrote:
>> 
>> Forgot to mentioned that below patch on v5.4 can easily reproduce the panic() on our platforms which I think is unexpected, as the policy->cpus already be updated after governor stop, but still try to have irq work queued on it.
>> 
>> static void dbs_update_util_handler(struct update_util_data *data, u64 time, unsigned int flags)
>> +       if (!cpumask_test_cpu(smp_processor_id(), policy_dbs->policy->cpus))
>> +               panic("...irq work on offline cpu %d\n", smp_processor_id());
>>        irq_work_queue(&policy_dbs->irq_work);
> 
> Yes, that is unexpected.
> 
> In cpufreq_offline(), we have:
> 
>    down_write(&policy->rwsem);
>    if (has_target())
>        cpufreq_stop_governor(policy);
> 
>    cpumask_clear_cpu(cpu, policy->cpus);
> 
> and cpufreq_stop_governor() calls policy->governor->stop(policy) which
> is cpufreq_dbs_governor_stop().
> 
> That calls gov_clear_update_util(policy_dbs->policy) first, which
> invokes cpufreq_remove_update_util_hook() for each CPU in policy->cpus
> and synchronizes RCU, so after that point none of the policy->cpus is
> expected to run dbs_update_util_handler().
> 
> policy->cpus is updated next and the governor is started again with
> the new policy->cpus.  Because the offline CPU is not there, it is not
> expected to run dbs_update_util_handler() again.
> 
> Do you only get the original error when one of the CPUs goes back online?

No, sometimes I also got this error during a CPU is being offline.

But the point is NOT that dbs_update_util_handler() called during governor stop, it is that this function is running on a CPU which already finish the governor stop function, I thought the original expectation is that this function ONLY be executed on the CPU which needs scaling frequency? Is this correct? v4.19 follows this expectation while v5.4 is NOT. 

The only thing I can image is the changes in kernel/sched/ folder cause this difference, but I still need more time to figure out what changes cause it, if you have any suggestion, please advise, thanks!

Anson

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-09 12:32                                   ` Anson Huang
@ 2019-12-09 12:44                                     ` Rafael J. Wysocki
  2019-12-09 14:18                                       ` Anson Huang
  2019-12-10  5:53                                       ` Peng Fan
  0 siblings, 2 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-09 12:44 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Peng Fan, Viresh Kumar,
	Jacky Bai, linux-pm

On Mon, Dec 9, 2019 at 1:32 PM Anson Huang <anson.huang@nxp.com> wrote:
>
>
>
> From Anson's iPhone 6
>
>
> > 在 2019年12月9日,19:23,Rafael J. Wysocki <rafael@kernel.org> 写道:
> >
> >> On Mon, Dec 9, 2019 at 11:57 AM Anson Huang <anson.huang@nxp.com> wrote:
> >>
> >> Forgot to mentioned that below patch on v5.4 can easily reproduce the panic() on our platforms which I think is unexpected, as the policy->cpus already be updated after governor stop, but still try to have irq work queued on it.
> >>
> >> static void dbs_update_util_handler(struct update_util_data *data, u64 time, unsigned int flags)
> >> +       if (!cpumask_test_cpu(smp_processor_id(), policy_dbs->policy->cpus))
> >> +               panic("...irq work on offline cpu %d\n", smp_processor_id());
> >>        irq_work_queue(&policy_dbs->irq_work);
> >
> > Yes, that is unexpected.
> >
> > In cpufreq_offline(), we have:
> >
> >    down_write(&policy->rwsem);
> >    if (has_target())
> >        cpufreq_stop_governor(policy);
> >
> >    cpumask_clear_cpu(cpu, policy->cpus);
> >
> > and cpufreq_stop_governor() calls policy->governor->stop(policy) which
> > is cpufreq_dbs_governor_stop().
> >
> > That calls gov_clear_update_util(policy_dbs->policy) first, which
> > invokes cpufreq_remove_update_util_hook() for each CPU in policy->cpus
> > and synchronizes RCU, so after that point none of the policy->cpus is
> > expected to run dbs_update_util_handler().
> >
> > policy->cpus is updated next and the governor is started again with
> > the new policy->cpus.  Because the offline CPU is not there, it is not
> > expected to run dbs_update_util_handler() again.
> >
> > Do you only get the original error when one of the CPUs goes back online?
>
> No, sometimes I also got this error during a CPU is being offline.
>
> But the point is NOT that dbs_update_util_handler() called during governor stop,
> it is that this function is running on a CPU which already finish the governor stop
> function,

Yes, it is, and which should not be possible as per the above.

The offline CPU is not there in prolicy->cpus when
cpufreq_dbs_governor_start() is called for the policy, so its
cpufreq_update_util_data pointer is not set (it is NULL at that time).
Therefore it is not expected to run dbs_update_util_handler() until it
is turn back online.

> I thought the original expectation is that this function ONLY be executed on the CPU which needs scaling frequency?
> Is this correct?

Yes, it is.

> v4.19 follows this expectation while v5.4 is NOT.

As per the kernel code, they both do.

> The only thing I can image is the changes in kernel/sched/ folder cause this difference, but I still need more time to figure out what changes cause it, if you have any suggestion, please advise, thanks!

The CPU offline/online (hotplug) rework was done after 4.19 IIRC and
that changed the way online works.  Now, it runs on the CPU going
online and previously it ran on the CPU "asking" the other one to go
online.  That may be what makes the difference (if my recollection of
the time frame is correct).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-09 12:44                                     ` Rafael J. Wysocki
@ 2019-12-09 14:18                                       ` Anson Huang
  2019-12-10  5:39                                         ` Anson Huang
  2019-12-10  5:53                                       ` Peng Fan
  1 sibling, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-12-09 14:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peng Fan, Viresh Kumar, Jacky Bai, linux-pm

Could it be caused by “policy->dvfs_possible_from_any_cpu” always being TRUE, so that the platforms using cpufreq-dt driver will have such issue? I think I made a mistake that v4.19 does NOT have such issue on i.MX platforms is because we don’t use cpufreq-dt driver while v5.4 we switch to it. I can verify it again tomorrow to see if v4.19 also have such issue by forcing policy->dvfs_possible_from_any_cpu to be TRUE.

Hi, Viresh
       You ever tried to reproduce it on your hand, the platform you used is using cpufreq-dt driver or NOT?

From Anson's iPhone 6


>> 在 2019年12月9日,20:44,Rafael J. Wysocki <rafael@kernel.org> 写道:
>> 
>> On Mon, Dec 9, 2019 at 1:32 PM Anson Huang <anson.huang@nxp.com> wrote:
>> 
>> 
>> 
>> From Anson's iPhone 6
>> 
>> 
>>>> 在 2019年12月9日,19:23,Rafael J. Wysocki <rafael@kernel.org> 写道:
>>>> 
>>>> On Mon, Dec 9, 2019 at 11:57 AM Anson Huang <anson.huang@nxp.com> wrote:
>>>> 
>>>> Forgot to mentioned that below patch on v5.4 can easily reproduce the panic() on our platforms which I think is unexpected, as the policy->cpus already be updated after governor stop, but still try to have irq work queued on it.
>>>> 
>>>> static void dbs_update_util_handler(struct update_util_data *data, u64 time, unsigned int flags)
>>>> +       if (!cpumask_test_cpu(smp_processor_id(), policy_dbs->policy->cpus))
>>>> +               panic("...irq work on offline cpu %d\n", smp_processor_id());
>>>>      irq_work_queue(&policy_dbs->irq_work);
>>> 
>>> Yes, that is unexpected.
>>> 
>>> In cpufreq_offline(), we have:
>>> 
>>>  down_write(&policy->rwsem);
>>>  if (has_target())
>>>      cpufreq_stop_governor(policy);
>>> 
>>>  cpumask_clear_cpu(cpu, policy->cpus);
>>> 
>>> and cpufreq_stop_governor() calls policy->governor->stop(policy) which
>>> is cpufreq_dbs_governor_stop().
>>> 
>>> That calls gov_clear_update_util(policy_dbs->policy) first, which
>>> invokes cpufreq_remove_update_util_hook() for each CPU in policy->cpus
>>> and synchronizes RCU, so after that point none of the policy->cpus is
>>> expected to run dbs_update_util_handler().
>>> 
>>> policy->cpus is updated next and the governor is started again with
>>> the new policy->cpus.  Because the offline CPU is not there, it is not
>>> expected to run dbs_update_util_handler() again.
>>> 
>>> Do you only get the original error when one of the CPUs goes back online?
>> 
>> No, sometimes I also got this error during a CPU is being offline.
>> 
>> But the point is NOT that dbs_update_util_handler() called during governor stop,
>> it is that this function is running on a CPU which already finish the governor stop
>> function,
> 
> Yes, it is, and which should not be possible as per the above.
> 
> The offline CPU is not there in prolicy->cpus when
> cpufreq_dbs_governor_start() is called for the policy, so its
> cpufreq_update_util_data pointer is not set (it is NULL at that time).
> Therefore it is not expected to run dbs_update_util_handler() until it
> is turn back online.
> 
>> I thought the original expectation is that this function ONLY be executed on the CPU which needs scaling frequency?
>> Is this correct?
> 
> Yes, it is.
> 
>> v4.19 follows this expectation while v5.4 is NOT.
> 
> As per the kernel code, they both do.
> 
>> The only thing I can image is the changes in kernel/sched/ folder cause this difference, but I still need more time to figure out what changes cause it, if you have any suggestion, please advise, thanks!
> 
> The CPU offline/online (hotplug) rework was done after 4.19 IIRC and
> that changed the way online works.  Now, it runs on the CPU going
> online and previously it ran on the CPU "asking" the other one to go
> online.  That may be what makes the difference (if my recollection of
> the time frame is correct).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-09 14:18                                       ` Anson Huang
@ 2019-12-10  5:39                                         ` Anson Huang
  0 siblings, 0 replies; 57+ messages in thread
From: Anson Huang @ 2019-12-10  5:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Peng Fan, Viresh Kumar, Jacky Bai, linux-pm, Ye Li

Hi, Rafael/Viresh
	Correct one thing, v4.19 also has such case, we don't use cpufreq-dt driver on v4.19, so "policy->dvfs_possible_from_any_cpu" is false, then dbs_update_util_handler()'s cpufreq_this_cpu_can_update() will take effect somehow and reduce the race condition much, NOT sure if it will completely avoid the race.
	Current cpufreq_update_util() function will allow a CPU to help other CPU do the util update, this is intentional by patch 674e75411fc2 ("sched: cpufreq: Allow remote cpufreq callbacks "), but I think this may have race condition with the CPU hot-plug, if a CPU is being offline, and already finish the cpufreq_dbs_governor_stop(), then it still could run to cpufreq_update_util() to help other online CPU do the util update, then irq work will be queued to this CPU which is unexpected, given that the "policy->dvfs_possible_from_any_cpu" is always TRUE for cpufreq-dt driver, then dbs_update_util_handler() has no other check for this scenario, and issue could happen.
	Do you think this race condition exists?
	I added below patch for both schedutil and other cpufreq governor, test passed > 5000 iterations and still running.

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index d2b5f06..68421c7 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -273,7 +273,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
                transition_latency = CPUFREQ_ETERNAL;

        policy->cpuinfo.transition_latency = transition_latency;
-       policy->dvfs_possible_from_any_cpu = true;
+       //policy->dvfs_possible_from_any_cpu = true;

        dev_pm_opp_of_register_em(policy->cpus);

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 86800b4..cc5b4a0 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -140,6 +140,9 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
        if (!sugov_update_next_freq(sg_policy, time, next_freq))
                return;

+       if (!cpufreq_this_cpu_can_update(sg_policy->policy))
+               return;
+

Anson

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> Could it be caused by “policy->dvfs_possible_from_any_cpu” always being
> TRUE, so that the platforms using cpufreq-dt driver will have such issue? I
> think I made a mistake that v4.19 does NOT have such issue on i.MX
> platforms is because we don’t use cpufreq-dt driver while v5.4 we switch to it.
> I can verify it again tomorrow to see if v4.19 also have such issue by forcing
> policy->dvfs_possible_from_any_cpu to be TRUE.
> 
> Hi, Viresh
>        You ever tried to reproduce it on your hand, the platform you used is
> using cpufreq-dt driver or NOT?
> 
> From Anson's iPhone 6
> 
> 
> >> 在 2019年12月9日,20:44,Rafael J. Wysocki <rafael@kernel.org> 写
> 道:
> >>
> >> On Mon, Dec 9, 2019 at 1:32 PM Anson Huang <anson.huang@nxp.com>
> wrote:
> >>
> >>
> >>
> >> From Anson's iPhone 6
> >>
> >>
> >>>> 在 2019年12月9日,19:23,Rafael J. Wysocki <rafael@kernel.org>
> 写道:
> >>>>
> >>>> On Mon, Dec 9, 2019 at 11:57 AM Anson Huang
> <anson.huang@nxp.com> wrote:
> >>>>
> >>>> Forgot to mentioned that below patch on v5.4 can easily reproduce the
> panic() on our platforms which I think is unexpected, as the policy->cpus
> already be updated after governor stop, but still try to have irq work queued
> on it.
> >>>>
> >>>> static void dbs_update_util_handler(struct update_util_data *data,
> >>>> u64 time, unsigned int flags)
> >>>> +       if (!cpumask_test_cpu(smp_processor_id(), policy_dbs->policy-
> >cpus))
> >>>> +               panic("...irq work on offline cpu %d\n",
> >>>> + smp_processor_id());
> >>>>      irq_work_queue(&policy_dbs->irq_work);
> >>>
> >>> Yes, that is unexpected.
> >>>
> >>> In cpufreq_offline(), we have:
> >>>
> >>>  down_write(&policy->rwsem);
> >>>  if (has_target())
> >>>      cpufreq_stop_governor(policy);
> >>>
> >>>  cpumask_clear_cpu(cpu, policy->cpus);
> >>>
> >>> and cpufreq_stop_governor() calls policy->governor->stop(policy)
> >>> which is cpufreq_dbs_governor_stop().
> >>>
> >>> That calls gov_clear_update_util(policy_dbs->policy) first, which
> >>> invokes cpufreq_remove_update_util_hook() for each CPU in
> >>> policy->cpus and synchronizes RCU, so after that point none of the
> >>> policy->cpus is expected to run dbs_update_util_handler().
> >>>
> >>> policy->cpus is updated next and the governor is started again with
> >>> the new policy->cpus.  Because the offline CPU is not there, it is
> >>> not expected to run dbs_update_util_handler() again.
> >>>
> >>> Do you only get the original error when one of the CPUs goes back online?
> >>
> >> No, sometimes I also got this error during a CPU is being offline.
> >>
> >> But the point is NOT that dbs_update_util_handler() called during
> >> governor stop, it is that this function is running on a CPU which
> >> already finish the governor stop function,
> >
> > Yes, it is, and which should not be possible as per the above.
> >
> > The offline CPU is not there in prolicy->cpus when
> > cpufreq_dbs_governor_start() is called for the policy, so its
> > cpufreq_update_util_data pointer is not set (it is NULL at that time).
> > Therefore it is not expected to run dbs_update_util_handler() until it
> > is turn back online.
> >
> >> I thought the original expectation is that this function ONLY be executed
> on the CPU which needs scaling frequency?
> >> Is this correct?
> >
> > Yes, it is.
> >
> >> v4.19 follows this expectation while v5.4 is NOT.
> >
> > As per the kernel code, they both do.
> >
> >> The only thing I can image is the changes in kernel/sched/ folder cause
> this difference, but I still need more time to figure out what changes cause it,
> if you have any suggestion, please advise, thanks!
> >
> > The CPU offline/online (hotplug) rework was done after 4.19 IIRC and
> > that changed the way online works.  Now, it runs on the CPU going
> > online and previously it ran on the CPU "asking" the other one to go
> > online.  That may be what makes the difference (if my recollection of
> > the time frame is correct).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-09 12:44                                     ` Rafael J. Wysocki
  2019-12-09 14:18                                       ` Anson Huang
@ 2019-12-10  5:53                                       ` Peng Fan
  2019-12-10  7:05                                         ` Viresh Kumar
  2019-12-10  8:12                                         ` Rafael J. Wysocki
  1 sibling, 2 replies; 57+ messages in thread
From: Peng Fan @ 2019-12-10  5:53 UTC (permalink / raw)
  To: Rafael J. Wysocki, Anson Huang
  Cc: Rafael J. Wysocki, Viresh Kumar, Jacky Bai, linux-pm

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Mon, Dec 9, 2019 at 1:32 PM Anson Huang <anson.huang@nxp.com>
> wrote:
> >
> >
> >
> > From Anson's iPhone 6
> >
> >
> > > 在 2019年12月9日,19:23,Rafael J. Wysocki <rafael@kernel.org> 写
> 道:
> > >
> > >> On Mon, Dec 9, 2019 at 11:57 AM Anson Huang <anson.huang@nxp.com>
> wrote:
> > >>
> > >> Forgot to mentioned that below patch on v5.4 can easily reproduce the
> panic() on our platforms which I think is unexpected, as the policy->cpus
> already be updated after governor stop, but still try to have irq work queued
> on it.
> > >>
> > >> static void dbs_update_util_handler(struct update_util_data *data,
> > >> u64 time, unsigned int flags)
> > >> +       if (!cpumask_test_cpu(smp_processor_id(),
> policy_dbs->policy->cpus))
> > >> +               panic("...irq work on offline cpu %d\n",
> > >> + smp_processor_id());
> > >>        irq_work_queue(&policy_dbs->irq_work);
> > >
> > > Yes, that is unexpected.
> > >
> > > In cpufreq_offline(), we have:
> > >
> > >    down_write(&policy->rwsem);
> > >    if (has_target())
> > >        cpufreq_stop_governor(policy);
> > >
> > >    cpumask_clear_cpu(cpu, policy->cpus);
> > >
> > > and cpufreq_stop_governor() calls policy->governor->stop(policy)
> > > which is cpufreq_dbs_governor_stop().
> > >
> > > That calls gov_clear_update_util(policy_dbs->policy) first, which
> > > invokes cpufreq_remove_update_util_hook() for each CPU in
> > > policy->cpus and synchronizes RCU, so after that point none of the
> > > policy->cpus is expected to run dbs_update_util_handler().
> > >
> > > policy->cpus is updated next and the governor is started again with
> > > the new policy->cpus.  Because the offline CPU is not there, it is
> > > not expected to run dbs_update_util_handler() again.
> > >
> > > Do you only get the original error when one of the CPUs goes back online?
> >
> > No, sometimes I also got this error during a CPU is being offline.
> >
> > But the point is NOT that dbs_update_util_handler() called during
> > governor stop, it is that this function is running on a CPU which
> > already finish the governor stop function,
> 
> Yes, it is, and which should not be possible as per the above.
> 
> The offline CPU is not there in prolicy->cpus when
> cpufreq_dbs_governor_start() is called for the policy, so its
> cpufreq_update_util_data pointer is not set (it is NULL at that time).
> Therefore it is not expected to run dbs_update_util_handler() until it is turn
> back online.
> 
> > I thought the original expectation is that this function ONLY be executed on
> the CPU which needs scaling frequency?
> > Is this correct?
> 
> Yes, it is.
> 
> > v4.19 follows this expectation while v5.4 is NOT.
> 
> As per the kernel code, they both do.

But per https://elixir.bootlin.com/linux/v5.5-rc1/source/kernel/sched/sched.h#L2293
cpu_of(rq) and smp_processor_id() is possible to not the same,

When cpu_of(rq) is not equal to smp_processor_id(), dbs_update_util_handler
will use irq_work_queue to smp_processor_id(), not cpu_of(rq). Is this
expected?
Or should the irq_work be queued to cpu_of(rq)?

Thanks,
Peng.

> 
> > The only thing I can image is the changes in kernel/sched/ folder cause this
> difference, but I still need more time to figure out what changes cause it, if
> you have any suggestion, please advise, thanks!
> 
> The CPU offline/online (hotplug) rework was done after 4.19 IIRC and that
> changed the way online works.  Now, it runs on the CPU going online and
> previously it ran on the CPU "asking" the other one to go online.  That may
> be what makes the difference (if my recollection of the time frame is correct).

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  5:53                                       ` Peng Fan
@ 2019-12-10  7:05                                         ` Viresh Kumar
  2019-12-10  8:22                                           ` Rafael J. Wysocki
  2019-12-10  8:12                                         ` Rafael J. Wysocki
  1 sibling, 1 reply; 57+ messages in thread
From: Viresh Kumar @ 2019-12-10  7:05 UTC (permalink / raw)
  To: Peng Fan
  Cc: Rafael J. Wysocki, Anson Huang, Rafael J. Wysocki, Jacky Bai,
	linux-pm, vincent.guittot, peterz, paulmck

+few more guys

On 10-12-19, 05:53, Peng Fan wrote:
> But per https://elixir.bootlin.com/linux/v5.5-rc1/source/kernel/sched/sched.h#L2293
> cpu_of(rq) and smp_processor_id() is possible to not the same,
> 
> When cpu_of(rq) is not equal to smp_processor_id(), dbs_update_util_handler
> will use irq_work_queue to smp_processor_id(), not cpu_of(rq). Is this
> expected?
> Or should the irq_work be queued to cpu_of(rq)?

Okay, sorry for the long weekend where I couldn't get time to reply at all.

First of all, lets try to understand dvfs_possible_from_any_cpu.

Who can update the frequency of a CPU ? For many architectures/platforms the
eventual code that writes to some register to change the frequency should only
run on the local CPU, as these registers are per-cpu registers and not something
shared between CPUs.

But for the ARM architecture, we have a PLL and then some more registers to play
with the clk provided to the CPU blocks and these registers (which are updated
as a result of clk_set_rate()) are part of a block outside of the CPU blocks.
And so any CPU (even if it is not part of the same cpufreq policy) can update
it. Setting this flag allows that and eventually we may end up updating the
frequency sooner, instead of later (which may be less effective). That was the
idea of the remote-wakeup series. This stuff is absolutely correct and so
cpufreq-dt does it for everyone.

This also means that the normal work and irq-work both can run on any CPU for
your platform and it should be okay to do that.

Now, we have necessary measures in place to make sure that after stopping and
before starting a governor, the scheduler hooks to save the cpufreq governor
pointer and updates to policy->cpus are made properly, to make sure that we
never ever schedule a work or irq-work on a CPU which is offline. Now it looks
like this isn't working as expected and we need to find what exactly is broken
here.

And yes, I did the testing on Hikey 620, an octa-core ARM platform which has a
single cpufreq policy which has all the 8 CPUs. And yes, I am using cpufreq-dt
only and I wasn't able to reproduce the problem with mainline kernel as I
explained earlier.

The problem is somewhere between the scheduler's governor hook running or
queuing work on a CPU which is in the middle of getting offline/online and there
is some race around that. The problem hence may not be related to just cpufreq,
but a wider variety of clients.

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  5:53                                       ` Peng Fan
  2019-12-10  7:05                                         ` Viresh Kumar
@ 2019-12-10  8:12                                         ` Rafael J. Wysocki
  1 sibling, 0 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10  8:12 UTC (permalink / raw)
  To: Peng Fan
  Cc: Rafael J. Wysocki, Anson Huang, Rafael J. Wysocki, Viresh Kumar,
	Jacky Bai, linux-pm

On Tue, Dec 10, 2019 at 6:53 AM Peng Fan <peng.fan@nxp.com> wrote:
>
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> >
> > On Mon, Dec 9, 2019 at 1:32 PM Anson Huang <anson.huang@nxp.com>
> > wrote:
> > >
> > >
> > >
> > > From Anson's iPhone 6
> > >
> > >
> > > > 在 2019年12月9日,19:23,Rafael J. Wysocki <rafael@kernel.org> 写
> > 道:
> > > >
> > > >> On Mon, Dec 9, 2019 at 11:57 AM Anson Huang <anson.huang@nxp.com>
> > wrote:
> > > >>
> > > >> Forgot to mentioned that below patch on v5.4 can easily reproduce the
> > panic() on our platforms which I think is unexpected, as the policy->cpus
> > already be updated after governor stop, but still try to have irq work queued
> > on it.
> > > >>
> > > >> static void dbs_update_util_handler(struct update_util_data *data,
> > > >> u64 time, unsigned int flags)
> > > >> +       if (!cpumask_test_cpu(smp_processor_id(),
> > policy_dbs->policy->cpus))
> > > >> +               panic("...irq work on offline cpu %d\n",
> > > >> + smp_processor_id());
> > > >>        irq_work_queue(&policy_dbs->irq_work);
> > > >
> > > > Yes, that is unexpected.
> > > >
> > > > In cpufreq_offline(), we have:
> > > >
> > > >    down_write(&policy->rwsem);
> > > >    if (has_target())
> > > >        cpufreq_stop_governor(policy);
> > > >
> > > >    cpumask_clear_cpu(cpu, policy->cpus);
> > > >
> > > > and cpufreq_stop_governor() calls policy->governor->stop(policy)
> > > > which is cpufreq_dbs_governor_stop().
> > > >
> > > > That calls gov_clear_update_util(policy_dbs->policy) first, which
> > > > invokes cpufreq_remove_update_util_hook() for each CPU in
> > > > policy->cpus and synchronizes RCU, so after that point none of the
> > > > policy->cpus is expected to run dbs_update_util_handler().
> > > >
> > > > policy->cpus is updated next and the governor is started again with
> > > > the new policy->cpus.  Because the offline CPU is not there, it is
> > > > not expected to run dbs_update_util_handler() again.
> > > >
> > > > Do you only get the original error when one of the CPUs goes back online?
> > >
> > > No, sometimes I also got this error during a CPU is being offline.
> > >
> > > But the point is NOT that dbs_update_util_handler() called during
> > > governor stop, it is that this function is running on a CPU which
> > > already finish the governor stop function,
> >
> > Yes, it is, and which should not be possible as per the above.
> >
> > The offline CPU is not there in prolicy->cpus when
> > cpufreq_dbs_governor_start() is called for the policy, so its
> > cpufreq_update_util_data pointer is not set (it is NULL at that time).
> > Therefore it is not expected to run dbs_update_util_handler() until it is turn
> > back online.
> >
> > > I thought the original expectation is that this function ONLY be executed on
> > the CPU which needs scaling frequency?
> > > Is this correct?
> >
> > Yes, it is.
> >
> > > v4.19 follows this expectation while v5.4 is NOT.
> >
> > As per the kernel code, they both do.
>
> But per https://elixir.bootlin.com/linux/v5.5-rc1/source/kernel/sched/sched.h#L2293
> cpu_of(rq) and smp_processor_id() is possible to not the same,
>
> When cpu_of(rq) is not equal to smp_processor_id(), dbs_update_util_handler
> will use irq_work_queue to smp_processor_id(), not cpu_of(rq). Is this
> expected?

Yes, it is, in general.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  7:05                                         ` Viresh Kumar
@ 2019-12-10  8:22                                           ` Rafael J. Wysocki
  2019-12-10  8:29                                             ` Anson Huang
  2019-12-10  8:31                                             ` Viresh Kumar
  0 siblings, 2 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10  8:22 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Peng Fan, Rafael J. Wysocki, Anson Huang, Rafael J. Wysocki,
	Jacky Bai, linux-pm, Vincent Guittot, Peter Zijlstra,
	Paul McKenney

On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> +few more guys
>
> On 10-12-19, 05:53, Peng Fan wrote:
> > But per https://elixir.bootlin.com/linux/v5.5-rc1/source/kernel/sched/sched.h#L2293
> > cpu_of(rq) and smp_processor_id() is possible to not the same,
> >
> > When cpu_of(rq) is not equal to smp_processor_id(), dbs_update_util_handler
> > will use irq_work_queue to smp_processor_id(), not cpu_of(rq). Is this
> > expected?
> > Or should the irq_work be queued to cpu_of(rq)?
>
> Okay, sorry for the long weekend where I couldn't get time to reply at all.

No worries. :-)

> First of all, lets try to understand dvfs_possible_from_any_cpu.
>
> Who can update the frequency of a CPU ? For many architectures/platforms the
> eventual code that writes to some register to change the frequency should only
> run on the local CPU, as these registers are per-cpu registers and not something
> shared between CPUs.
>
> But for the ARM architecture, we have a PLL and then some more registers to play
> with the clk provided to the CPU blocks and these registers (which are updated
> as a result of clk_set_rate()) are part of a block outside of the CPU blocks.
> And so any CPU (even if it is not part of the same cpufreq policy) can update
> it. Setting this flag allows that and eventually we may end up updating the
> frequency sooner, instead of later (which may be less effective). That was the
> idea of the remote-wakeup series. This stuff is absolutely correct and so
> cpufreq-dt does it for everyone.
>
> This also means that the normal work and irq-work both can run on any CPU for
> your platform and it should be okay to do that.

And it the failing case all of the CPUs in the system are in the same
policy anyway, so dvfs_possible_from_any_cpu is a red herring.

> Now, we have necessary measures in place to make sure that after stopping and
> before starting a governor, the scheduler hooks to save the cpufreq governor
> pointer and updates to policy->cpus are made properly, to make sure that we
> never ever schedule a work or irq-work on a CPU which is offline. Now it looks
> like this isn't working as expected and we need to find what exactly is broken
> here.
>
> And yes, I did the testing on Hikey 620, an octa-core ARM platform which has a
> single cpufreq policy which has all the 8 CPUs. And yes, I am using cpufreq-dt
> only and I wasn't able to reproduce the problem with mainline kernel as I
> explained earlier.
>
> The problem is somewhere between the scheduler's governor hook running or
> queuing work on a CPU which is in the middle of getting offline/online and there
> is some race around that. The problem hence may not be related to just cpufreq,
> but a wider variety of clients.

The problem is that a CPU is running a governor hook which it
shouldn't be running at all.

The observation that dvfs_possible_from_any_cpu makes a difference
only means that the governor hook is running on a CPU that is not
present in the policy->cpus mask.  On the platform(s) in question this
cannot happen as long as RCU works as expected.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:22                                           ` Rafael J. Wysocki
@ 2019-12-10  8:29                                             ` Anson Huang
  2019-12-10  8:36                                               ` Viresh Kumar
  2019-12-10  8:37                                               ` Rafael J. Wysocki
  2019-12-10  8:31                                             ` Viresh Kumar
  1 sibling, 2 replies; 57+ messages in thread
From: Anson Huang @ 2019-12-10  8:29 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: Peng Fan, Rafael J. Wysocki, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney



> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Tuesday, December 10, 2019 4:22 PM
> To: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki <rafael@kernel.org>;
> Anson Huang <anson.huang@nxp.com>; Rafael J. Wysocki
> <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>; Peter
> Zijlstra <peterz@infradead.org>; Paul McKenney
> <paulmck@linux.vnet.ibm.com>
> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar <viresh.kumar@linaro.org>
> wrote:
> >
> > +few more guys
> >
> > On 10-12-19, 05:53, Peng Fan wrote:
> > > But per
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> > > ixir.bootlin.com%2Flinux%2Fv5.5-
> rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > >
> d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> be3404
> > >
> e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> 7C637
> > >
> 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> W%2Faw31
> > > 0AYcM%3D&amp;reserved=0
> > > cpu_of(rq) and smp_processor_id() is possible to not the same,
> > >
> > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > dbs_update_util_handler will use irq_work_queue to
> > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > Or should the irq_work be queued to cpu_of(rq)?
> >
> > Okay, sorry for the long weekend where I couldn't get time to reply at all.
> 
> No worries. :-)
> 
> > First of all, lets try to understand dvfs_possible_from_any_cpu.
> >
> > Who can update the frequency of a CPU ? For many
> > architectures/platforms the eventual code that writes to some register
> > to change the frequency should only run on the local CPU, as these
> > registers are per-cpu registers and not something shared between CPUs.
> >
> > But for the ARM architecture, we have a PLL and then some more
> > registers to play with the clk provided to the CPU blocks and these
> > registers (which are updated as a result of clk_set_rate()) are part of a
> block outside of the CPU blocks.
> > And so any CPU (even if it is not part of the same cpufreq policy) can
> > update it. Setting this flag allows that and eventually we may end up
> > updating the frequency sooner, instead of later (which may be less
> > effective). That was the idea of the remote-wakeup series. This stuff
> > is absolutely correct and so cpufreq-dt does it for everyone.
> >
> > This also means that the normal work and irq-work both can run on any
> > CPU for your platform and it should be okay to do that.
> 
> And it the failing case all of the CPUs in the system are in the same policy
> anyway, so dvfs_possible_from_any_cpu is a red herring.
> 
> > Now, we have necessary measures in place to make sure that after
> > stopping and before starting a governor, the scheduler hooks to save
> > the cpufreq governor pointer and updates to policy->cpus are made
> > properly, to make sure that we never ever schedule a work or irq-work
> > on a CPU which is offline. Now it looks like this isn't working as
> > expected and we need to find what exactly is broken here.
> >
> > And yes, I did the testing on Hikey 620, an octa-core ARM platform
> > which has a single cpufreq policy which has all the 8 CPUs. And yes, I
> > am using cpufreq-dt only and I wasn't able to reproduce the problem
> > with mainline kernel as I explained earlier.
> >
> > The problem is somewhere between the scheduler's governor hook
> running
> > or queuing work on a CPU which is in the middle of getting
> > offline/online and there is some race around that. The problem hence
> > may not be related to just cpufreq, but a wider variety of clients.
> 
> The problem is that a CPU is running a governor hook which it shouldn't be
> running at all.
> 
> The observation that dvfs_possible_from_any_cpu makes a difference only
> means that the governor hook is running on a CPU that is not present in the
> policy->cpus mask.  On the platform(s) in question this cannot happen as
> long as RCU works as expected.

If I understand correctly, the governor hook ONLY be clear on the CPU being offline and
after governor stopped, but the CPU being offline could still run into below function to help
other CPU update the util, and it ONLY checks the cpu_of(rq)'s governor hook which is valid
as that CPU is online.

So the question is how to avoid the CPU being offline and already finish the governor stop
flow be scheduled to help other CPU update the util.

 static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
 {
         struct update_util_data *data;

         data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
                                                   cpu_of(rq)));
         if (data)
                 data->func(data, rq_clock(rq), flags);
 }

Anson

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:22                                           ` Rafael J. Wysocki
  2019-12-10  8:29                                             ` Anson Huang
@ 2019-12-10  8:31                                             ` Viresh Kumar
  1 sibling, 0 replies; 57+ messages in thread
From: Viresh Kumar @ 2019-12-10  8:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peng Fan, Anson Huang, Rafael J. Wysocki, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

On 10-12-19, 09:22, Rafael J. Wysocki wrote:
> The problem is that a CPU is running a governor hook which it
> shouldn't be running at all.
> 
> The observation that dvfs_possible_from_any_cpu makes a difference
> only means that the governor hook is running on a CPU that is not
> present in the policy->cpus mask.  On the platform(s) in question this
> cannot happen as long as RCU works as expected.

I was worried about RCUs the day I saw the first email from Anson, but going
there meant more trouble and so I thought it must be something else :)

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:29                                             ` Anson Huang
@ 2019-12-10  8:36                                               ` Viresh Kumar
  2019-12-10  8:37                                                 ` Peng Fan
  2019-12-10  8:37                                               ` Rafael J. Wysocki
  1 sibling, 1 reply; 57+ messages in thread
From: Viresh Kumar @ 2019-12-10  8:36 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Peng Fan, Rafael J. Wysocki, Jacky Bai,
	linux-pm, Vincent Guittot, Peter Zijlstra, Paul McKenney

On 10-12-19, 08:29, Anson Huang wrote:
> If I understand correctly, the governor hook ONLY be clear on the CPU being offline and
> after governor stopped, but the CPU being offline could still run into below function to help
> other CPU update the util, and it ONLY checks the cpu_of(rq)'s governor hook which is valid
> as that CPU is online.

An offline CPU should never be running this helper as its hook is cleared
followed by rcu-sync.

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:36                                               ` Viresh Kumar
@ 2019-12-10  8:37                                                 ` Peng Fan
  0 siblings, 0 replies; 57+ messages in thread
From: Peng Fan @ 2019-12-10  8:37 UTC (permalink / raw)
  To: Viresh Kumar, Anson Huang
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On 10-12-19, 08:29, Anson Huang wrote:
> > If I understand correctly, the governor hook ONLY be clear on the CPU
> > being offline and after governor stopped, but the CPU being offline
> > could still run into below function to help other CPU update the util,
> > and it ONLY checks the cpu_of(rq)'s governor hook which is valid as that
> CPU is online.
> 
> An offline CPU should never be running this helper as its hook is cleared
> followed by rcu-sync.

sync rcu might be buggy return early when do rcu_reference_sched?

Thanks,
Peng.

> 
> --
> viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:29                                             ` Anson Huang
  2019-12-10  8:36                                               ` Viresh Kumar
@ 2019-12-10  8:37                                               ` Rafael J. Wysocki
  2019-12-10  8:43                                                 ` Peng Fan
  2019-12-10  8:45                                                 ` Anson Huang
  1 sibling, 2 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10  8:37 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Viresh Kumar, Peng Fan, Rafael J. Wysocki,
	Jacky Bai, linux-pm, Vincent Guittot, Peter Zijlstra,
	Paul McKenney

On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Rafael J. Wysocki <rafael@kernel.org>
> > Sent: Tuesday, December 10, 2019 4:22 PM
> > To: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki <rafael@kernel.org>;
> > Anson Huang <anson.huang@nxp.com>; Rafael J. Wysocki
> > <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>; Peter
> > Zijlstra <peterz@infradead.org>; Paul McKenney
> > <paulmck@linux.vnet.ibm.com>
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> >
> > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar <viresh.kumar@linaro.org>
> > wrote:
> > >
> > > +few more guys
> > >
> > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > But per
> > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fel
> > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > >
> > d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> > be3404
> > > >
> > e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > 7C637
> > > >
> > 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > W%2Faw31
> > > > 0AYcM%3D&amp;reserved=0
> > > > cpu_of(rq) and smp_processor_id() is possible to not the same,
> > > >
> > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > dbs_update_util_handler will use irq_work_queue to
> > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > Or should the irq_work be queued to cpu_of(rq)?
> > >
> > > Okay, sorry for the long weekend where I couldn't get time to reply at all.
> >
> > No worries. :-)
> >
> > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > >
> > > Who can update the frequency of a CPU ? For many
> > > architectures/platforms the eventual code that writes to some register
> > > to change the frequency should only run on the local CPU, as these
> > > registers are per-cpu registers and not something shared between CPUs.
> > >
> > > But for the ARM architecture, we have a PLL and then some more
> > > registers to play with the clk provided to the CPU blocks and these
> > > registers (which are updated as a result of clk_set_rate()) are part of a
> > block outside of the CPU blocks.
> > > And so any CPU (even if it is not part of the same cpufreq policy) can
> > > update it. Setting this flag allows that and eventually we may end up
> > > updating the frequency sooner, instead of later (which may be less
> > > effective). That was the idea of the remote-wakeup series. This stuff
> > > is absolutely correct and so cpufreq-dt does it for everyone.
> > >
> > > This also means that the normal work and irq-work both can run on any
> > > CPU for your platform and it should be okay to do that.
> >
> > And it the failing case all of the CPUs in the system are in the same policy
> > anyway, so dvfs_possible_from_any_cpu is a red herring.
> >
> > > Now, we have necessary measures in place to make sure that after
> > > stopping and before starting a governor, the scheduler hooks to save
> > > the cpufreq governor pointer and updates to policy->cpus are made
> > > properly, to make sure that we never ever schedule a work or irq-work
> > > on a CPU which is offline. Now it looks like this isn't working as
> > > expected and we need to find what exactly is broken here.
> > >
> > > And yes, I did the testing on Hikey 620, an octa-core ARM platform
> > > which has a single cpufreq policy which has all the 8 CPUs. And yes, I
> > > am using cpufreq-dt only and I wasn't able to reproduce the problem
> > > with mainline kernel as I explained earlier.
> > >
> > > The problem is somewhere between the scheduler's governor hook
> > running
> > > or queuing work on a CPU which is in the middle of getting
> > > offline/online and there is some race around that. The problem hence
> > > may not be related to just cpufreq, but a wider variety of clients.
> >
> > The problem is that a CPU is running a governor hook which it shouldn't be
> > running at all.
> >
> > The observation that dvfs_possible_from_any_cpu makes a difference only
> > means that the governor hook is running on a CPU that is not present in the
> > policy->cpus mask.  On the platform(s) in question this cannot happen as
> > long as RCU works as expected.
>
> If I understand correctly, the governor hook ONLY be clear on the CPU being offline and
> after governor stopped, but the CPU being offline could still run into below function to help
> other CPU update the util, and it ONLY checks the cpu_of(rq)'s governor hook which is valid
> as that CPU is online.
>
> So the question is how to avoid the CPU being offline and already finish the governor stop
> flow be scheduled to help other CPU update the util.
>
>  static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
>  {
>          struct update_util_data *data;
>
>          data = rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
>                                                    cpu_of(rq)));
>          if (data)
>                  data->func(data, rq_clock(rq), flags);
>  }

OK, so that's where the problem is, good catch!

So what happens is that a CPU going offline runs some scheduler code
that invokes cpufreq_update_util().  Incidentally, it is not the
cpu_of(rq), but that CPU is still online, so the callback is invoked
and then policy->cpus test is bypassed because of
dvfs_possible_from_any_cpu.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:37                                               ` Rafael J. Wysocki
@ 2019-12-10  8:43                                                 ` Peng Fan
  2019-12-10  8:45                                                 ` Anson Huang
  1 sibling, 0 replies; 57+ messages in thread
From: Peng Fan @ 2019-12-10  8:43 UTC (permalink / raw)
  To: Rafael J. Wysocki, Anson Huang
  Cc: Viresh Kumar, Rafael J. Wysocki, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>; Rafael J.
> > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>;
> > > Peter Zijlstra <peterz@infradead.org>; Paul McKenney
> > > <paulmck@linux.vnet.ibm.com>
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > <viresh.kumar@linaro.org>
> > > wrote:
> > > >
> > > > +few more guys
> > > >
> > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > But per
> > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > 2Fel
> > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > >
> > >
> d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f4490
> 0
> > > be3404
> > > > >
> > >
> e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
> %
> > > 7C637
> > > > >
> > >
> 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > W%2Faw31
> > > > > 0AYcM%3D&amp;reserved=0
> > > > > cpu_of(rq) and smp_processor_id() is possible to not the same,
> > > > >
> > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > >
> > > > Okay, sorry for the long weekend where I couldn't get time to reply at all.
> > >
> > > No worries. :-)
> > >
> > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > >
> > > > Who can update the frequency of a CPU ? For many
> > > > architectures/platforms the eventual code that writes to some
> > > > register to change the frequency should only run on the local CPU,
> > > > as these registers are per-cpu registers and not something shared
> between CPUs.
> > > >
> > > > But for the ARM architecture, we have a PLL and then some more
> > > > registers to play with the clk provided to the CPU blocks and
> > > > these registers (which are updated as a result of clk_set_rate())
> > > > are part of a
> > > block outside of the CPU blocks.
> > > > And so any CPU (even if it is not part of the same cpufreq policy)
> > > > can update it. Setting this flag allows that and eventually we may
> > > > end up updating the frequency sooner, instead of later (which may
> > > > be less effective). That was the idea of the remote-wakeup series.
> > > > This stuff is absolutely correct and so cpufreq-dt does it for everyone.
> > > >
> > > > This also means that the normal work and irq-work both can run on
> > > > any CPU for your platform and it should be okay to do that.
> > >
> > > And it the failing case all of the CPUs in the system are in the
> > > same policy anyway, so dvfs_possible_from_any_cpu is a red herring.
> > >
> > > > Now, we have necessary measures in place to make sure that after
> > > > stopping and before starting a governor, the scheduler hooks to
> > > > save the cpufreq governor pointer and updates to policy->cpus are
> > > > made properly, to make sure that we never ever schedule a work or
> > > > irq-work on a CPU which is offline. Now it looks like this isn't
> > > > working as expected and we need to find what exactly is broken here.
> > > >
> > > > And yes, I did the testing on Hikey 620, an octa-core ARM platform
> > > > which has a single cpufreq policy which has all the 8 CPUs. And
> > > > yes, I am using cpufreq-dt only and I wasn't able to reproduce the
> > > > problem with mainline kernel as I explained earlier.
> > > >
> > > > The problem is somewhere between the scheduler's governor hook
> > > running
> > > > or queuing work on a CPU which is in the middle of getting
> > > > offline/online and there is some race around that. The problem
> > > > hence may not be related to just cpufreq, but a wider variety of clients.
> > >
> > > The problem is that a CPU is running a governor hook which it
> > > shouldn't be running at all.
> > >
> > > The observation that dvfs_possible_from_any_cpu makes a difference
> > > only means that the governor hook is running on a CPU that is not
> > > present in the
> > > policy->cpus mask.  On the platform(s) in question this cannot
> > > policy->happen as
> > > long as RCU works as expected.
> >
> > If I understand correctly, the governor hook ONLY be clear on the CPU
> > being offline and after governor stopped, but the CPU being offline
> > could still run into below function to help other CPU update the util,
> > and it ONLY checks the cpu_of(rq)'s governor hook which is valid as that
> CPU is online.
> >
> > So the question is how to avoid the CPU being offline and already
> > finish the governor stop flow be scheduled to help other CPU update the
> util.
> >
> >  static inline void cpufreq_update_util(struct rq *rq, unsigned int
> > flags)  {
> >          struct update_util_data *data;
> >
> >          data =
> rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> >
> cpu_of(rq)));
> >          if (data)
> >                  data->func(data, rq_clock(rq), flags);  }
> 
> OK, so that's where the problem is, good catch!
> 
> So what happens is that a CPU going offline runs some scheduler code that
> invokes cpufreq_update_util().  Incidentally, it is not the cpu_of(rq), but that
> CPU is still online, so the callback is invoked and then policy->cpus test is
> bypassed because of dvfs_possible_from_any_cpu.

gov_clear_update_util(policy_dbs->policy)->sync rcu; should wait rcu_derefence_sched,
right? or I understand wrong?

Thanks,
Peng.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:37                                               ` Rafael J. Wysocki
  2019-12-10  8:43                                                 ` Peng Fan
@ 2019-12-10  8:45                                                 ` Anson Huang
  2019-12-10  8:50                                                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-12-10  8:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Peng Fan, Rafael J. Wysocki, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney



> -----Original Message-----
> From: Rafael J. Wysocki <rafael@kernel.org>
> Sent: Tuesday, December 10, 2019 4:38 PM
> To: Anson Huang <anson.huang@nxp.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael J.
> Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>; Peter
> Zijlstra <peterz@infradead.org>; Paul McKenney
> <paulmck@linux.vnet.ibm.com>
> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com>
> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>; Rafael J.
> > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>;
> > > Peter Zijlstra <peterz@infradead.org>; Paul McKenney
> > > <paulmck@linux.vnet.ibm.com>
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > <viresh.kumar@linaro.org>
> > > wrote:
> > > >
> > > > +few more guys
> > > >
> > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > But per
> > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > 2Fel
> > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > >
> > >
> d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> > > be3404
> > > > >
> > >
> e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > > 7C637
> > > > >
> > >
> 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > W%2Faw31
> > > > > 0AYcM%3D&amp;reserved=0
> > > > > cpu_of(rq) and smp_processor_id() is possible to not the same,
> > > > >
> > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > >
> > > > Okay, sorry for the long weekend where I couldn't get time to reply at
> all.
> > >
> > > No worries. :-)
> > >
> > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > >
> > > > Who can update the frequency of a CPU ? For many
> > > > architectures/platforms the eventual code that writes to some
> > > > register to change the frequency should only run on the local CPU,
> > > > as these registers are per-cpu registers and not something shared
> between CPUs.
> > > >
> > > > But for the ARM architecture, we have a PLL and then some more
> > > > registers to play with the clk provided to the CPU blocks and
> > > > these registers (which are updated as a result of clk_set_rate())
> > > > are part of a
> > > block outside of the CPU blocks.
> > > > And so any CPU (even if it is not part of the same cpufreq policy)
> > > > can update it. Setting this flag allows that and eventually we may
> > > > end up updating the frequency sooner, instead of later (which may
> > > > be less effective). That was the idea of the remote-wakeup series.
> > > > This stuff is absolutely correct and so cpufreq-dt does it for everyone.
> > > >
> > > > This also means that the normal work and irq-work both can run on
> > > > any CPU for your platform and it should be okay to do that.
> > >
> > > And it the failing case all of the CPUs in the system are in the
> > > same policy anyway, so dvfs_possible_from_any_cpu is a red herring.
> > >
> > > > Now, we have necessary measures in place to make sure that after
> > > > stopping and before starting a governor, the scheduler hooks to
> > > > save the cpufreq governor pointer and updates to policy->cpus are
> > > > made properly, to make sure that we never ever schedule a work or
> > > > irq-work on a CPU which is offline. Now it looks like this isn't
> > > > working as expected and we need to find what exactly is broken here.
> > > >
> > > > And yes, I did the testing on Hikey 620, an octa-core ARM platform
> > > > which has a single cpufreq policy which has all the 8 CPUs. And
> > > > yes, I am using cpufreq-dt only and I wasn't able to reproduce the
> > > > problem with mainline kernel as I explained earlier.
> > > >
> > > > The problem is somewhere between the scheduler's governor hook
> > > running
> > > > or queuing work on a CPU which is in the middle of getting
> > > > offline/online and there is some race around that. The problem
> > > > hence may not be related to just cpufreq, but a wider variety of clients.
> > >
> > > The problem is that a CPU is running a governor hook which it
> > > shouldn't be running at all.
> > >
> > > The observation that dvfs_possible_from_any_cpu makes a difference
> > > only means that the governor hook is running on a CPU that is not
> > > present in the
> > > policy->cpus mask.  On the platform(s) in question this cannot
> > > policy->happen as
> > > long as RCU works as expected.
> >
> > If I understand correctly, the governor hook ONLY be clear on the CPU
> > being offline and after governor stopped, but the CPU being offline
> > could still run into below function to help other CPU update the util,
> > and it ONLY checks the cpu_of(rq)'s governor hook which is valid as that
> CPU is online.
> >
> > So the question is how to avoid the CPU being offline and already
> > finish the governor stop flow be scheduled to help other CPU update the
> util.
> >
> >  static inline void cpufreq_update_util(struct rq *rq, unsigned int
> > flags)  {
> >          struct update_util_data *data;
> >
> >          data =
> rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> >                                                    cpu_of(rq)));
> >          if (data)
> >                  data->func(data, rq_clock(rq), flags);  }
> 
> OK, so that's where the problem is, good catch!
> 
> So what happens is that a CPU going offline runs some scheduler code that
> invokes cpufreq_update_util().  Incidentally, it is not the cpu_of(rq), but that
> CPU is still online, so the callback is invoked and then policy->cpus test is
> bypassed because of dvfs_possible_from_any_cpu.

If this is the issue, add another check here for the current CPU's governor hook?
Or any other better place to make sure the CPU being offline NOT to be queued to irq work?

Anson


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:45                                                 ` Anson Huang
@ 2019-12-10  8:50                                                   ` Rafael J. Wysocki
  2019-12-10  8:51                                                     ` Anson Huang
                                                                       ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10  8:50 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Viresh Kumar, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote:
> 
> > -----Original Message-----
> > From: Rafael J. Wysocki <rafael@kernel.org>
> > Sent: Tuesday, December 10, 2019 4:38 PM
> > To: Anson Huang <anson.huang@nxp.com>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael J.
> > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>; Peter
> > Zijlstra <peterz@infradead.org>; Paul McKenney
> > <paulmck@linux.vnet.ibm.com>
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > 
> > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com>
> > wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>; Rafael J.
> > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>;
> > > > Peter Zijlstra <peterz@infradead.org>; Paul McKenney
> > > > <paulmck@linux.vnet.ibm.com>
> > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > > >
> > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > <viresh.kumar@linaro.org>
> > > > wrote:
> > > > >
> > > > > +few more guys
> > > > >
> > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > But per
> > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > > 2Fel
> > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > >
> > > >
> > d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> > > > be3404
> > > > > >
> > > >
> > e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > > > 7C637
> > > > > >
> > > >
> > 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > W%2Faw31
> > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > cpu_of(rq) and smp_processor_id() is possible to not the same,
> > > > > >
> > > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > >
> > > > > Okay, sorry for the long weekend where I couldn't get time to reply at
> > all.
> > > >
> > > > No worries. :-)
> > > >
> > > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > > >
> > > > > Who can update the frequency of a CPU ? For many
> > > > > architectures/platforms the eventual code that writes to some
> > > > > register to change the frequency should only run on the local CPU,
> > > > > as these registers are per-cpu registers and not something shared
> > between CPUs.
> > > > >
> > > > > But for the ARM architecture, we have a PLL and then some more
> > > > > registers to play with the clk provided to the CPU blocks and
> > > > > these registers (which are updated as a result of clk_set_rate())
> > > > > are part of a
> > > > block outside of the CPU blocks.
> > > > > And so any CPU (even if it is not part of the same cpufreq policy)
> > > > > can update it. Setting this flag allows that and eventually we may
> > > > > end up updating the frequency sooner, instead of later (which may
> > > > > be less effective). That was the idea of the remote-wakeup series.
> > > > > This stuff is absolutely correct and so cpufreq-dt does it for everyone.
> > > > >
> > > > > This also means that the normal work and irq-work both can run on
> > > > > any CPU for your platform and it should be okay to do that.
> > > >
> > > > And it the failing case all of the CPUs in the system are in the
> > > > same policy anyway, so dvfs_possible_from_any_cpu is a red herring.
> > > >
> > > > > Now, we have necessary measures in place to make sure that after
> > > > > stopping and before starting a governor, the scheduler hooks to
> > > > > save the cpufreq governor pointer and updates to policy->cpus are
> > > > > made properly, to make sure that we never ever schedule a work or
> > > > > irq-work on a CPU which is offline. Now it looks like this isn't
> > > > > working as expected and we need to find what exactly is broken here.
> > > > >
> > > > > And yes, I did the testing on Hikey 620, an octa-core ARM platform
> > > > > which has a single cpufreq policy which has all the 8 CPUs. And
> > > > > yes, I am using cpufreq-dt only and I wasn't able to reproduce the
> > > > > problem with mainline kernel as I explained earlier.
> > > > >
> > > > > The problem is somewhere between the scheduler's governor hook
> > > > running
> > > > > or queuing work on a CPU which is in the middle of getting
> > > > > offline/online and there is some race around that. The problem
> > > > > hence may not be related to just cpufreq, but a wider variety of clients.
> > > >
> > > > The problem is that a CPU is running a governor hook which it
> > > > shouldn't be running at all.
> > > >
> > > > The observation that dvfs_possible_from_any_cpu makes a difference
> > > > only means that the governor hook is running on a CPU that is not
> > > > present in the
> > > > policy->cpus mask.  On the platform(s) in question this cannot
> > > > policy->happen as
> > > > long as RCU works as expected.
> > >
> > > If I understand correctly, the governor hook ONLY be clear on the CPU
> > > being offline and after governor stopped, but the CPU being offline
> > > could still run into below function to help other CPU update the util,
> > > and it ONLY checks the cpu_of(rq)'s governor hook which is valid as that
> > CPU is online.
> > >
> > > So the question is how to avoid the CPU being offline and already
> > > finish the governor stop flow be scheduled to help other CPU update the
> > util.
> > >
> > >  static inline void cpufreq_update_util(struct rq *rq, unsigned int
> > > flags)  {
> > >          struct update_util_data *data;
> > >
> > >          data =
> > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> > >                                                    cpu_of(rq)));
> > >          if (data)
> > >                  data->func(data, rq_clock(rq), flags);  }
> > 
> > OK, so that's where the problem is, good catch!
> > 
> > So what happens is that a CPU going offline runs some scheduler code that
> > invokes cpufreq_update_util().  Incidentally, it is not the cpu_of(rq), but that
> > CPU is still online, so the callback is invoked and then policy->cpus test is
> > bypassed because of dvfs_possible_from_any_cpu.
> 
> If this is the issue, add another check here for the current CPU's governor hook?
> Or any other better place to make sure the CPU being offline NOT to be queued to irq work?

Generally, yes.

Something like the patch below should help if I'm not mistaken:

---
 include/linux/cpufreq.h |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_
 {
 	/*
 	 * Allow remote callbacks if:
-	 * - dvfs_possible_from_any_cpu flag is set
 	 * - the local and remote CPUs share cpufreq policy
+	 * - dvfs_possible_from_any_cpu flag is set and the CPU running the
+	 *   code is not going offline.
 	 */
-	return policy->dvfs_possible_from_any_cpu ||
-		cpumask_test_cpu(smp_processor_id(), policy->cpus);
+	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
+		(policy->dvfs_possible_from_any_cpu &&
+		 !cpumask_test_cpu(smp_processor_id(), policy->related_cpus));
 }
 
 /*********************************************************************




^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:50                                                   ` Rafael J. Wysocki
@ 2019-12-10  8:51                                                     ` Anson Huang
  2019-12-10 10:39                                                       ` Rafael J. Wysocki
  2019-12-10  8:57                                                     ` Viresh Kumar
  2019-12-10  9:04                                                     ` Rafael J. Wysocki
  2 siblings, 1 reply; 57+ messages in thread
From: Anson Huang @ 2019-12-10  8:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Viresh Kumar, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney



> -----Original Message-----
> From: Rafael J. Wysocki <rjw@rjwysocki.net>
> Sent: Tuesday, December 10, 2019 4:51 PM
> To: Anson Huang <anson.huang@nxp.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Jacky Bai
> <ping.bai@nxp.com>; linux-pm@vger.kernel.org; Vincent Guittot
> <vincent.guittot@linaro.org>; Peter Zijlstra <peterz@infradead.org>; Paul
> McKenney <paulmck@linux.vnet.ibm.com>
> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote:
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: Tuesday, December 10, 2019 4:38 PM
> > > To: Anson Huang <anson.huang@nxp.com>
> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael J.
> > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>;
> > > Peter Zijlstra <peterz@infradead.org>; Paul McKenney
> > > <paulmck@linux.vnet.ibm.com>
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com>
> > > wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>; Rafael
> J.
> > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > <peterz@infradead.org>; Paul McKenney
> > > > > <paulmck@linux.vnet.ibm.com>
> > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > > driver
> > > > >
> > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > > <viresh.kumar@linaro.org>
> > > > > wrote:
> > > > > >
> > > > > > +few more guys
> > > > > >
> > > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > > But per
> > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A
> > > > > > > %2F%
> > > > > > > 2Fel
> > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > > >
> > > > >
> > >
> d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> > > > > be3404
> > > > > > >
> > > > >
> > >
> e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > > > > 7C637
> > > > > > >
> > > > >
> > >
> 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > > W%2Faw31
> > > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > > cpu_of(rq) and smp_processor_id() is possible to not the
> > > > > > > same,
> > > > > > >
> > > > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > > >
> > > > > > Okay, sorry for the long weekend where I couldn't get time to
> > > > > > reply at
> > > all.
> > > > >
> > > > > No worries. :-)
> > > > >
> > > > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > > > >
> > > > > > Who can update the frequency of a CPU ? For many
> > > > > > architectures/platforms the eventual code that writes to some
> > > > > > register to change the frequency should only run on the local
> > > > > > CPU, as these registers are per-cpu registers and not
> > > > > > something shared
> > > between CPUs.
> > > > > >
> > > > > > But for the ARM architecture, we have a PLL and then some more
> > > > > > registers to play with the clk provided to the CPU blocks and
> > > > > > these registers (which are updated as a result of
> > > > > > clk_set_rate()) are part of a
> > > > > block outside of the CPU blocks.
> > > > > > And so any CPU (even if it is not part of the same cpufreq
> > > > > > policy) can update it. Setting this flag allows that and
> > > > > > eventually we may end up updating the frequency sooner,
> > > > > > instead of later (which may be less effective). That was the idea of
> the remote-wakeup series.
> > > > > > This stuff is absolutely correct and so cpufreq-dt does it for
> everyone.
> > > > > >
> > > > > > This also means that the normal work and irq-work both can run
> > > > > > on any CPU for your platform and it should be okay to do that.
> > > > >
> > > > > And it the failing case all of the CPUs in the system are in the
> > > > > same policy anyway, so dvfs_possible_from_any_cpu is a red herring.
> > > > >
> > > > > > Now, we have necessary measures in place to make sure that
> > > > > > after stopping and before starting a governor, the scheduler
> > > > > > hooks to save the cpufreq governor pointer and updates to
> > > > > > policy->cpus are made properly, to make sure that we never
> > > > > > ever schedule a work or irq-work on a CPU which is offline.
> > > > > > Now it looks like this isn't working as expected and we need to find
> what exactly is broken here.
> > > > > >
> > > > > > And yes, I did the testing on Hikey 620, an octa-core ARM
> > > > > > platform which has a single cpufreq policy which has all the 8
> > > > > > CPUs. And yes, I am using cpufreq-dt only and I wasn't able to
> > > > > > reproduce the problem with mainline kernel as I explained earlier.
> > > > > >
> > > > > > The problem is somewhere between the scheduler's governor hook
> > > > > running
> > > > > > or queuing work on a CPU which is in the middle of getting
> > > > > > offline/online and there is some race around that. The problem
> > > > > > hence may not be related to just cpufreq, but a wider variety of
> clients.
> > > > >
> > > > > The problem is that a CPU is running a governor hook which it
> > > > > shouldn't be running at all.
> > > > >
> > > > > The observation that dvfs_possible_from_any_cpu makes a
> > > > > difference only means that the governor hook is running on a CPU
> > > > > that is not present in the
> > > > > policy->cpus mask.  On the platform(s) in question this cannot
> > > > > policy->happen as
> > > > > long as RCU works as expected.
> > > >
> > > > If I understand correctly, the governor hook ONLY be clear on the
> > > > CPU being offline and after governor stopped, but the CPU being
> > > > offline could still run into below function to help other CPU
> > > > update the util, and it ONLY checks the cpu_of(rq)'s governor hook
> > > > which is valid as that
> > > CPU is online.
> > > >
> > > > So the question is how to avoid the CPU being offline and already
> > > > finish the governor stop flow be scheduled to help other CPU
> > > > update the
> > > util.
> > > >
> > > >  static inline void cpufreq_update_util(struct rq *rq, unsigned
> > > > int
> > > > flags)  {
> > > >          struct update_util_data *data;
> > > >
> > > >          data =
> > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> > > >                                                    cpu_of(rq)));
> > > >          if (data)
> > > >                  data->func(data, rq_clock(rq), flags);  }
> > >
> > > OK, so that's where the problem is, good catch!
> > >
> > > So what happens is that a CPU going offline runs some scheduler code
> > > that invokes cpufreq_update_util().  Incidentally, it is not the
> > > cpu_of(rq), but that CPU is still online, so the callback is invoked
> > > and then policy->cpus test is bypassed because of
> dvfs_possible_from_any_cpu.
> >
> > If this is the issue, add another check here for the current CPU's governor
> hook?
> > Or any other better place to make sure the CPU being offline NOT to be
> queued to irq work?
> 
> Generally, yes.
> 
> Something like the patch below should help if I'm not mistaken:
> 
> ---
>  include/linux/cpufreq.h |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> Index: linux-pm/include/linux/cpufreq.h
> ================================================================
> ===
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_  {
>  	/*
>  	 * Allow remote callbacks if:
> -	 * - dvfs_possible_from_any_cpu flag is set
>  	 * - the local and remote CPUs share cpufreq policy
> +	 * - dvfs_possible_from_any_cpu flag is set and the CPU running the
> +	 *   code is not going offline.
>  	 */
> -	return policy->dvfs_possible_from_any_cpu ||
> -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> +		(policy->dvfs_possible_from_any_cpu &&
> +		 !cpumask_test_cpu(smp_processor_id(), policy-
> >related_cpus));
>  }

I will start a stress test of this patch, thanks!

Anson

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:50                                                   ` Rafael J. Wysocki
  2019-12-10  8:51                                                     ` Anson Huang
@ 2019-12-10  8:57                                                     ` Viresh Kumar
  2019-12-10 11:03                                                       ` Rafael J. Wysocki
  2019-12-10  9:04                                                     ` Rafael J. Wysocki
  2 siblings, 1 reply; 57+ messages in thread
From: Viresh Kumar @ 2019-12-10  8:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Anson Huang, Rafael J. Wysocki, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

On 10-12-19, 09:50, Rafael J. Wysocki wrote:
> Index: linux-pm/include/linux/cpufreq.h
> ===================================================================
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_
>  {
>  	/*
>  	 * Allow remote callbacks if:
> -	 * - dvfs_possible_from_any_cpu flag is set
>  	 * - the local and remote CPUs share cpufreq policy
> +	 * - dvfs_possible_from_any_cpu flag is set and the CPU running the
> +	 *   code is not going offline.
>  	 */
> -	return policy->dvfs_possible_from_any_cpu ||
> -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> +		(policy->dvfs_possible_from_any_cpu &&
> +		 !cpumask_test_cpu(smp_processor_id(), policy->related_cpus));

This isn't enough as you are assuming that only a CPU from policy->related_cpus
can do remote processing. On a ARM platform (like Qcom's Krait, octa-core), all
8 CPUs have separate policies as they don't share clock lines. Though they can
still do remote processing for each other as the clk registers are common.

Also policy->related_cpus can anyway update frequency for the policy even if
dvfs_possible_from_any_cpu is set to false.

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:50                                                   ` Rafael J. Wysocki
  2019-12-10  8:51                                                     ` Anson Huang
  2019-12-10  8:57                                                     ` Viresh Kumar
@ 2019-12-10  9:04                                                     ` Rafael J. Wysocki
  2 siblings, 0 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10  9:04 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Viresh Kumar, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

On Tue, Dec 10, 2019 at 9:50 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote:
> >
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > Sent: Tuesday, December 10, 2019 4:38 PM
> > > To: Anson Huang <anson.huang@nxp.com>
> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael J.
> > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>; Peter
> > > Zijlstra <peterz@infradead.org>; Paul McKenney
> > > <paulmck@linux.vnet.ibm.com>
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com>
> > > wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>; Rafael J.
> > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>;
> > > > > Peter Zijlstra <peterz@infradead.org>; Paul McKenney
> > > > > <paulmck@linux.vnet.ibm.com>
> > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > > > >
> > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > > <viresh.kumar@linaro.org>
> > > > > wrote:
> > > > > >
> > > > > > +few more guys
> > > > > >
> > > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > > But per
> > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%
> > > > > > > 2Fel
> > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > > >
> > > > >
> > > d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> > > > > be3404
> > > > > > >
> > > > >
> > > e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > > > > 7C637
> > > > > > >
> > > > >
> > > 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > > W%2Faw31
> > > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > > cpu_of(rq) and smp_processor_id() is possible to not the same,
> > > > > > >
> > > > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > > >
> > > > > > Okay, sorry for the long weekend where I couldn't get time to reply at
> > > all.
> > > > >
> > > > > No worries. :-)
> > > > >
> > > > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > > > >
> > > > > > Who can update the frequency of a CPU ? For many
> > > > > > architectures/platforms the eventual code that writes to some
> > > > > > register to change the frequency should only run on the local CPU,
> > > > > > as these registers are per-cpu registers and not something shared
> > > between CPUs.
> > > > > >
> > > > > > But for the ARM architecture, we have a PLL and then some more
> > > > > > registers to play with the clk provided to the CPU blocks and
> > > > > > these registers (which are updated as a result of clk_set_rate())
> > > > > > are part of a
> > > > > block outside of the CPU blocks.
> > > > > > And so any CPU (even if it is not part of the same cpufreq policy)
> > > > > > can update it. Setting this flag allows that and eventually we may
> > > > > > end up updating the frequency sooner, instead of later (which may
> > > > > > be less effective). That was the idea of the remote-wakeup series.
> > > > > > This stuff is absolutely correct and so cpufreq-dt does it for everyone.
> > > > > >
> > > > > > This also means that the normal work and irq-work both can run on
> > > > > > any CPU for your platform and it should be okay to do that.
> > > > >
> > > > > And it the failing case all of the CPUs in the system are in the
> > > > > same policy anyway, so dvfs_possible_from_any_cpu is a red herring.
> > > > >
> > > > > > Now, we have necessary measures in place to make sure that after
> > > > > > stopping and before starting a governor, the scheduler hooks to
> > > > > > save the cpufreq governor pointer and updates to policy->cpus are
> > > > > > made properly, to make sure that we never ever schedule a work or
> > > > > > irq-work on a CPU which is offline. Now it looks like this isn't
> > > > > > working as expected and we need to find what exactly is broken here.
> > > > > >
> > > > > > And yes, I did the testing on Hikey 620, an octa-core ARM platform
> > > > > > which has a single cpufreq policy which has all the 8 CPUs. And
> > > > > > yes, I am using cpufreq-dt only and I wasn't able to reproduce the
> > > > > > problem with mainline kernel as I explained earlier.
> > > > > >
> > > > > > The problem is somewhere between the scheduler's governor hook
> > > > > running
> > > > > > or queuing work on a CPU which is in the middle of getting
> > > > > > offline/online and there is some race around that. The problem
> > > > > > hence may not be related to just cpufreq, but a wider variety of clients.
> > > > >
> > > > > The problem is that a CPU is running a governor hook which it
> > > > > shouldn't be running at all.
> > > > >
> > > > > The observation that dvfs_possible_from_any_cpu makes a difference
> > > > > only means that the governor hook is running on a CPU that is not
> > > > > present in the
> > > > > policy->cpus mask.  On the platform(s) in question this cannot
> > > > > policy->happen as
> > > > > long as RCU works as expected.
> > > >
> > > > If I understand correctly, the governor hook ONLY be clear on the CPU
> > > > being offline and after governor stopped, but the CPU being offline
> > > > could still run into below function to help other CPU update the util,
> > > > and it ONLY checks the cpu_of(rq)'s governor hook which is valid as that
> > > CPU is online.
> > > >
> > > > So the question is how to avoid the CPU being offline and already
> > > > finish the governor stop flow be scheduled to help other CPU update the
> > > util.
> > > >
> > > >  static inline void cpufreq_update_util(struct rq *rq, unsigned int
> > > > flags)  {
> > > >          struct update_util_data *data;
> > > >
> > > >          data =
> > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> > > >                                                    cpu_of(rq)));
> > > >          if (data)
> > > >                  data->func(data, rq_clock(rq), flags);  }
> > >
> > > OK, so that's where the problem is, good catch!
> > >
> > > So what happens is that a CPU going offline runs some scheduler code that
> > > invokes cpufreq_update_util().  Incidentally, it is not the cpu_of(rq), but that
> > > CPU is still online, so the callback is invoked and then policy->cpus test is
> > > bypassed because of dvfs_possible_from_any_cpu.
> >
> > If this is the issue, add another check here for the current CPU's governor hook?
> > Or any other better place to make sure the CPU being offline NOT to be queued to irq work?
>
> Generally, yes.
>
> Something like the patch below should help if I'm not mistaken:
>
> ---
>  include/linux/cpufreq.h |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> Index: linux-pm/include/linux/cpufreq.h
> ===================================================================
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_
>  {
>         /*
>          * Allow remote callbacks if:
> -        * - dvfs_possible_from_any_cpu flag is set
>          * - the local and remote CPUs share cpufreq policy
> +        * - dvfs_possible_from_any_cpu flag is set and the CPU running the
> +        *   code is not going offline.
>          */
> -       return policy->dvfs_possible_from_any_cpu ||
> -               cpumask_test_cpu(smp_processor_id(), policy->cpus);
> +       return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> +               (policy->dvfs_possible_from_any_cpu &&
> +                !cpumask_test_cpu(smp_processor_id(), policy->related_cpus));
>  }
>
>  /*********************************************************************
>

Actually, this is not sufficient, because the CPU going offline need
not belong to the same policy and if that is the case, the problem
will still occur AFAICS.

Please test it anyway so as to confirm that we are on the right track, though.

A better approach would be to queue the irq_work on a different CPU if
the CPU running the code is not in the policy->cpus set.

I'll cut a patch for that too in a minute.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:51                                                     ` Anson Huang
@ 2019-12-10 10:39                                                       ` Rafael J. Wysocki
  2019-12-10 10:54                                                         ` Rafael J. Wysocki
  2019-12-10 10:54                                                         ` Viresh Kumar
  0 siblings, 2 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10 10:39 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Viresh Kumar, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

On Tuesday, December 10, 2019 9:51:43 AM CET Anson Huang wrote:
> 
> > -----Original Message-----
> > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > Sent: Tuesday, December 10, 2019 4:51 PM
> > To: Anson Huang <anson.huang@nxp.com>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Jacky Bai
> > <ping.bai@nxp.com>; linux-pm@vger.kernel.org; Vincent Guittot
> > <vincent.guittot@linaro.org>; Peter Zijlstra <peterz@infradead.org>; Paul
> > McKenney <paulmck@linux.vnet.ibm.com>
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > 
> > On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote:
> > >
> > > > -----Original Message-----
> > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > Sent: Tuesday, December 10, 2019 4:38 PM
> > > > To: Anson Huang <anson.huang@nxp.com>
> > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael J.
> > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>;
> > > > Peter Zijlstra <peterz@infradead.org>; Paul McKenney
> > > > <paulmck@linux.vnet.ibm.com>
> > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > > >
> > > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com>
> > > > wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>; Rafael
> > J.
> > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > > > driver
> > > > > >
> > > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > > > <viresh.kumar@linaro.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > +few more guys
> > > > > > >
> > > > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > > > But per
> > > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A
> > > > > > > > %2F%
> > > > > > > > 2Fel
> > > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > > > >
> > > > > >
> > > >
> > d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> > > > > > be3404
> > > > > > > >
> > > > > >
> > > >
> > e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > > > > > 7C637
> > > > > > > >
> > > > > >
> > > >
> > 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > > > W%2Faw31
> > > > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > > > cpu_of(rq) and smp_processor_id() is possible to not the
> > > > > > > > same,
> > > > > > > >
> > > > > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > > > >
> > > > > > > Okay, sorry for the long weekend where I couldn't get time to
> > > > > > > reply at
> > > > all.
> > > > > >
> > > > > > No worries. :-)
> > > > > >
> > > > > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > > > > >
> > > > > > > Who can update the frequency of a CPU ? For many
> > > > > > > architectures/platforms the eventual code that writes to some
> > > > > > > register to change the frequency should only run on the local
> > > > > > > CPU, as these registers are per-cpu registers and not
> > > > > > > something shared
> > > > between CPUs.
> > > > > > >
> > > > > > > But for the ARM architecture, we have a PLL and then some more
> > > > > > > registers to play with the clk provided to the CPU blocks and
> > > > > > > these registers (which are updated as a result of
> > > > > > > clk_set_rate()) are part of a
> > > > > > block outside of the CPU blocks.
> > > > > > > And so any CPU (even if it is not part of the same cpufreq
> > > > > > > policy) can update it. Setting this flag allows that and
> > > > > > > eventually we may end up updating the frequency sooner,
> > > > > > > instead of later (which may be less effective). That was the idea of
> > the remote-wakeup series.
> > > > > > > This stuff is absolutely correct and so cpufreq-dt does it for
> > everyone.
> > > > > > >
> > > > > > > This also means that the normal work and irq-work both can run
> > > > > > > on any CPU for your platform and it should be okay to do that.
> > > > > >
> > > > > > And it the failing case all of the CPUs in the system are in the
> > > > > > same policy anyway, so dvfs_possible_from_any_cpu is a red herring.
> > > > > >
> > > > > > > Now, we have necessary measures in place to make sure that
> > > > > > > after stopping and before starting a governor, the scheduler
> > > > > > > hooks to save the cpufreq governor pointer and updates to
> > > > > > > policy->cpus are made properly, to make sure that we never
> > > > > > > ever schedule a work or irq-work on a CPU which is offline.
> > > > > > > Now it looks like this isn't working as expected and we need to find
> > what exactly is broken here.
> > > > > > >
> > > > > > > And yes, I did the testing on Hikey 620, an octa-core ARM
> > > > > > > platform which has a single cpufreq policy which has all the 8
> > > > > > > CPUs. And yes, I am using cpufreq-dt only and I wasn't able to
> > > > > > > reproduce the problem with mainline kernel as I explained earlier.
> > > > > > >
> > > > > > > The problem is somewhere between the scheduler's governor hook
> > > > > > running
> > > > > > > or queuing work on a CPU which is in the middle of getting
> > > > > > > offline/online and there is some race around that. The problem
> > > > > > > hence may not be related to just cpufreq, but a wider variety of
> > clients.
> > > > > >
> > > > > > The problem is that a CPU is running a governor hook which it
> > > > > > shouldn't be running at all.
> > > > > >
> > > > > > The observation that dvfs_possible_from_any_cpu makes a
> > > > > > difference only means that the governor hook is running on a CPU
> > > > > > that is not present in the
> > > > > > policy->cpus mask.  On the platform(s) in question this cannot
> > > > > > policy->happen as
> > > > > > long as RCU works as expected.
> > > > >
> > > > > If I understand correctly, the governor hook ONLY be clear on the
> > > > > CPU being offline and after governor stopped, but the CPU being
> > > > > offline could still run into below function to help other CPU
> > > > > update the util, and it ONLY checks the cpu_of(rq)'s governor hook
> > > > > which is valid as that
> > > > CPU is online.
> > > > >
> > > > > So the question is how to avoid the CPU being offline and already
> > > > > finish the governor stop flow be scheduled to help other CPU
> > > > > update the
> > > > util.
> > > > >
> > > > >  static inline void cpufreq_update_util(struct rq *rq, unsigned
> > > > > int
> > > > > flags)  {
> > > > >          struct update_util_data *data;
> > > > >
> > > > >          data =
> > > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> > > > >                                                    cpu_of(rq)));
> > > > >          if (data)
> > > > >                  data->func(data, rq_clock(rq), flags);  }
> > > >
> > > > OK, so that's where the problem is, good catch!
> > > >
> > > > So what happens is that a CPU going offline runs some scheduler code
> > > > that invokes cpufreq_update_util().  Incidentally, it is not the
> > > > cpu_of(rq), but that CPU is still online, so the callback is invoked
> > > > and then policy->cpus test is bypassed because of
> > dvfs_possible_from_any_cpu.
> > >
> > > If this is the issue, add another check here for the current CPU's governor
> > hook?
> > > Or any other better place to make sure the CPU being offline NOT to be
> > queued to irq work?
> > 
> > Generally, yes.
> > 
> > Something like the patch below should help if I'm not mistaken:
> > 
> > ---
> >  include/linux/cpufreq.h |    8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > Index: linux-pm/include/linux/cpufreq.h
> > ================================================================
> > ===
> > --- linux-pm.orig/include/linux/cpufreq.h
> > +++ linux-pm/include/linux/cpufreq.h
> > @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_  {
> >  	/*
> >  	 * Allow remote callbacks if:
> > -	 * - dvfs_possible_from_any_cpu flag is set
> >  	 * - the local and remote CPUs share cpufreq policy
> > +	 * - dvfs_possible_from_any_cpu flag is set and the CPU running the
> > +	 *   code is not going offline.
> >  	 */
> > -	return policy->dvfs_possible_from_any_cpu ||
> > -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> > +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > +		(policy->dvfs_possible_from_any_cpu &&
> > +		 !cpumask_test_cpu(smp_processor_id(), policy-
> > >related_cpus));
> >  }
> 
> I will start a stress test of this patch, thanks!

OK, thanks!

Another patch to test is appended and it should be more robust.

Instead of doing the related_cpus cpumask check in the previous patch (which
only covered CPUs that belog to the target policy) it checks if the update_util
hook is set for the local CPU (if it is not, that CPU is not expected to run
the uodate_util code).

---
 include/linux/cpufreq.h       |   11 -----------
 include/linux/sched/cpufreq.h |    3 +++
 kernel/sched/cpufreq.c        |   18 ++++++++++++++++++
 3 files changed, 21 insertions(+), 11 deletions(-)

Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -595,17 +595,6 @@ struct governor_attr {
 			 size_t count);
 };
 
-static inline bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
-{
-	/*
-	 * Allow remote callbacks if:
-	 * - dvfs_possible_from_any_cpu flag is set
-	 * - the local and remote CPUs share cpufreq policy
-	 */
-	return policy->dvfs_possible_from_any_cpu ||
-		cpumask_test_cpu(smp_processor_id(), policy->cpus);
-}
-
 /*********************************************************************
  *                     FREQUENCY TABLE HELPERS                       *
  *********************************************************************/
Index: linux-pm/kernel/sched/cpufreq.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq.c
+++ linux-pm/kernel/sched/cpufreq.c
@@ -5,6 +5,8 @@
  * Copyright (C) 2016, Intel Corporation
  * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
  */
+#include <linux/cpufreq.h>
+
 #include "sched.h"
 
 DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);
@@ -57,3 +59,19 @@ void cpufreq_remove_update_util_hook(int
 	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);
 }
 EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
+
+/**
+ * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated.
+ * @policy: cpufreq policy to check.
+ *
+ * Return 'true' if:
+ * - the local and remote CPUs share @policy,
+ * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is not going
+ *   offline (in which it is not expected to run cpufreq updates any more).
+ */
+bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
+{
+	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
+		(policy->dvfs_possible_from_any_cpu &&
+		 rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
+}
Index: linux-pm/include/linux/sched/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/sched/cpufreq.h
+++ linux-pm/include/linux/sched/cpufreq.h
@@ -12,6 +12,8 @@
 #define SCHED_CPUFREQ_MIGRATION	(1U << 1)
 
 #ifdef CONFIG_CPU_FREQ
+struct cpufreq_policy;
+
 struct update_util_data {
        void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
 };
@@ -20,6 +22,7 @@ void cpufreq_add_update_util_hook(int cp
                        void (*func)(struct update_util_data *data, u64 time,
 				    unsigned int flags));
 void cpufreq_remove_update_util_hook(int cpu);
+bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy);
 
 static inline unsigned long map_util_freq(unsigned long util,
 					unsigned long freq, unsigned long cap)
 



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10 10:39                                                       ` Rafael J. Wysocki
  2019-12-10 10:54                                                         ` Rafael J. Wysocki
@ 2019-12-10 10:54                                                         ` Viresh Kumar
  2019-12-10 11:07                                                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 57+ messages in thread
From: Viresh Kumar @ 2019-12-10 10:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Anson Huang, Rafael J. Wysocki, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

On 10-12-19, 11:39, Rafael J. Wysocki wrote:
> Index: linux-pm/kernel/sched/cpufreq.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq.c
> +++ linux-pm/kernel/sched/cpufreq.c
> @@ -5,6 +5,8 @@
>   * Copyright (C) 2016, Intel Corporation
>   * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>   */
> +#include <linux/cpufreq.h>
> +
>  #include "sched.h"
>  
>  DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);
> @@ -57,3 +59,19 @@ void cpufreq_remove_update_util_hook(int
>  	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
> +
> +/**
> + * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated.
> + * @policy: cpufreq policy to check.
> + *
> + * Return 'true' if:
> + * - the local and remote CPUs share @policy,
> + * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is not going
> + *   offline (in which it is not expected to run cpufreq updates any more).
> + */
> +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
> +{
> +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> +		(policy->dvfs_possible_from_any_cpu &&

> +		 rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));

I somehow feel that doing this particular check in cpufreq_update_util() maybe
better. Or maybe we can call cpufreq_this_cpu_can_update() itself right from
cpufreq_update_util() instead and remove it from multiple places in the
governors.

-- 
viresh

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10 10:39                                                       ` Rafael J. Wysocki
@ 2019-12-10 10:54                                                         ` Rafael J. Wysocki
  2019-12-11  5:08                                                           ` Anson Huang
  2019-12-11  8:59                                                           ` Peng Fan
  2019-12-10 10:54                                                         ` Viresh Kumar
  1 sibling, 2 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10 10:54 UTC (permalink / raw)
  To: Anson Huang
  Cc: Rafael J. Wysocki, Viresh Kumar, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

On Tuesday, December 10, 2019 11:39:25 AM CET Rafael J. Wysocki wrote:
> On Tuesday, December 10, 2019 9:51:43 AM CET Anson Huang wrote:
> > 
> > > -----Original Message-----
> > > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > Sent: Tuesday, December 10, 2019 4:51 PM
> > > To: Anson Huang <anson.huang@nxp.com>
> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Jacky Bai
> > > <ping.bai@nxp.com>; linux-pm@vger.kernel.org; Vincent Guittot
> > > <vincent.guittot@linaro.org>; Peter Zijlstra <peterz@infradead.org>; Paul
> > > McKenney <paulmck@linux.vnet.ibm.com>
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > > 
> > > On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > Sent: Tuesday, December 10, 2019 4:38 PM
> > > > > To: Anson Huang <anson.huang@nxp.com>
> > > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael J.
> > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>; linux-
> > > > > pm@vger.kernel.org; Vincent Guittot <vincent.guittot@linaro.org>;
> > > > > Peter Zijlstra <peterz@infradead.org>; Paul McKenney
> > > > > <paulmck@linux.vnet.ibm.com>
> > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > > > >
> > > > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang <anson.huang@nxp.com>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > > > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>; Rafael
> > > J.
> > > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > > > > driver
> > > > > > >
> > > > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > > > > <viresh.kumar@linaro.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > +few more guys
> > > > > > > >
> > > > > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > > > > But per
> > > > > > > > > https://eur01.safelinks.protection.outlook.com/?url=https%3A
> > > > > > > > > %2F%
> > > > > > > > > 2Fel
> > > > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > > > > >
> > > > > > >
> > > > >
> > > d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f44900
> > > > > > > be3404
> > > > > > > > >
> > > > > > >
> > > > >
> > > e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%
> > > > > > > 7C637
> > > > > > > > >
> > > > > > >
> > > > >
> > > 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > > > > W%2Faw31
> > > > > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > > > > cpu_of(rq) and smp_processor_id() is possible to not the
> > > > > > > > > same,
> > > > > > > > >
> > > > > > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > > > > >
> > > > > > > > Okay, sorry for the long weekend where I couldn't get time to
> > > > > > > > reply at
> > > > > all.
> > > > > > >
> > > > > > > No worries. :-)
> > > > > > >
> > > > > > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > > > > > >
> > > > > > > > Who can update the frequency of a CPU ? For many
> > > > > > > > architectures/platforms the eventual code that writes to some
> > > > > > > > register to change the frequency should only run on the local
> > > > > > > > CPU, as these registers are per-cpu registers and not
> > > > > > > > something shared
> > > > > between CPUs.
> > > > > > > >
> > > > > > > > But for the ARM architecture, we have a PLL and then some more
> > > > > > > > registers to play with the clk provided to the CPU blocks and
> > > > > > > > these registers (which are updated as a result of
> > > > > > > > clk_set_rate()) are part of a
> > > > > > > block outside of the CPU blocks.
> > > > > > > > And so any CPU (even if it is not part of the same cpufreq
> > > > > > > > policy) can update it. Setting this flag allows that and
> > > > > > > > eventually we may end up updating the frequency sooner,
> > > > > > > > instead of later (which may be less effective). That was the idea of
> > > the remote-wakeup series.
> > > > > > > > This stuff is absolutely correct and so cpufreq-dt does it for
> > > everyone.
> > > > > > > >
> > > > > > > > This also means that the normal work and irq-work both can run
> > > > > > > > on any CPU for your platform and it should be okay to do that.
> > > > > > >
> > > > > > > And it the failing case all of the CPUs in the system are in the
> > > > > > > same policy anyway, so dvfs_possible_from_any_cpu is a red herring.
> > > > > > >
> > > > > > > > Now, we have necessary measures in place to make sure that
> > > > > > > > after stopping and before starting a governor, the scheduler
> > > > > > > > hooks to save the cpufreq governor pointer and updates to
> > > > > > > > policy->cpus are made properly, to make sure that we never
> > > > > > > > ever schedule a work or irq-work on a CPU which is offline.
> > > > > > > > Now it looks like this isn't working as expected and we need to find
> > > what exactly is broken here.
> > > > > > > >
> > > > > > > > And yes, I did the testing on Hikey 620, an octa-core ARM
> > > > > > > > platform which has a single cpufreq policy which has all the 8
> > > > > > > > CPUs. And yes, I am using cpufreq-dt only and I wasn't able to
> > > > > > > > reproduce the problem with mainline kernel as I explained earlier.
> > > > > > > >
> > > > > > > > The problem is somewhere between the scheduler's governor hook
> > > > > > > running
> > > > > > > > or queuing work on a CPU which is in the middle of getting
> > > > > > > > offline/online and there is some race around that. The problem
> > > > > > > > hence may not be related to just cpufreq, but a wider variety of
> > > clients.
> > > > > > >
> > > > > > > The problem is that a CPU is running a governor hook which it
> > > > > > > shouldn't be running at all.
> > > > > > >
> > > > > > > The observation that dvfs_possible_from_any_cpu makes a
> > > > > > > difference only means that the governor hook is running on a CPU
> > > > > > > that is not present in the
> > > > > > > policy->cpus mask.  On the platform(s) in question this cannot
> > > > > > > policy->happen as
> > > > > > > long as RCU works as expected.
> > > > > >
> > > > > > If I understand correctly, the governor hook ONLY be clear on the
> > > > > > CPU being offline and after governor stopped, but the CPU being
> > > > > > offline could still run into below function to help other CPU
> > > > > > update the util, and it ONLY checks the cpu_of(rq)'s governor hook
> > > > > > which is valid as that
> > > > > CPU is online.
> > > > > >
> > > > > > So the question is how to avoid the CPU being offline and already
> > > > > > finish the governor stop flow be scheduled to help other CPU
> > > > > > update the
> > > > > util.
> > > > > >
> > > > > >  static inline void cpufreq_update_util(struct rq *rq, unsigned
> > > > > > int
> > > > > > flags)  {
> > > > > >          struct update_util_data *data;
> > > > > >
> > > > > >          data =
> > > > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> > > > > >                                                    cpu_of(rq)));
> > > > > >          if (data)
> > > > > >                  data->func(data, rq_clock(rq), flags);  }
> > > > >
> > > > > OK, so that's where the problem is, good catch!
> > > > >
> > > > > So what happens is that a CPU going offline runs some scheduler code
> > > > > that invokes cpufreq_update_util().  Incidentally, it is not the
> > > > > cpu_of(rq), but that CPU is still online, so the callback is invoked
> > > > > and then policy->cpus test is bypassed because of
> > > dvfs_possible_from_any_cpu.
> > > >
> > > > If this is the issue, add another check here for the current CPU's governor
> > > hook?
> > > > Or any other better place to make sure the CPU being offline NOT to be
> > > queued to irq work?
> > > 
> > > Generally, yes.
> > > 
> > > Something like the patch below should help if I'm not mistaken:
> > > 
> > > ---
> > >  include/linux/cpufreq.h |    8 +++++---
> > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > 
> > > Index: linux-pm/include/linux/cpufreq.h
> > > ================================================================
> > > ===
> > > --- linux-pm.orig/include/linux/cpufreq.h
> > > +++ linux-pm/include/linux/cpufreq.h
> > > @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_  {
> > >  	/*
> > >  	 * Allow remote callbacks if:
> > > -	 * - dvfs_possible_from_any_cpu flag is set
> > >  	 * - the local and remote CPUs share cpufreq policy
> > > +	 * - dvfs_possible_from_any_cpu flag is set and the CPU running the
> > > +	 *   code is not going offline.
> > >  	 */
> > > -	return policy->dvfs_possible_from_any_cpu ||
> > > -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> > > +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > > +		(policy->dvfs_possible_from_any_cpu &&
> > > +		 !cpumask_test_cpu(smp_processor_id(), policy-
> > > >related_cpus));
> > >  }
> > 
> > I will start a stress test of this patch, thanks!
> 
> OK, thanks!
> 
> Another patch to test is appended and it should be more robust.
> 
> Instead of doing the related_cpus cpumask check in the previous patch (which
> only covered CPUs that belog to the target policy) it checks if the update_util
> hook is set for the local CPU (if it is not, that CPU is not expected to run
> the uodate_util code).

One more thing.

Both of the previous patches would not fix the schedutil governor in which
cpufreq_this_cpu_can_update() only is called in the fast_switch case and
that is not when irq_works are used.

So please discard the patch I have just posted and here is an updated patch
that covers schedutil too, so please test this one instead.

---
 include/linux/cpufreq.h          |   11 -----------
 include/linux/sched/cpufreq.h    |    3 +++
 kernel/sched/cpufreq.c           |   18 ++++++++++++++++++
 kernel/sched/cpufreq_schedutil.c |    8 +++-----
 4 files changed, 24 insertions(+), 16 deletions(-)

Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -595,17 +595,6 @@ struct governor_attr {
 			 size_t count);
 };
 
-static inline bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
-{
-	/*
-	 * Allow remote callbacks if:
-	 * - dvfs_possible_from_any_cpu flag is set
-	 * - the local and remote CPUs share cpufreq policy
-	 */
-	return policy->dvfs_possible_from_any_cpu ||
-		cpumask_test_cpu(smp_processor_id(), policy->cpus);
-}
-
 /*********************************************************************
  *                     FREQUENCY TABLE HELPERS                       *
  *********************************************************************/
Index: linux-pm/kernel/sched/cpufreq.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq.c
+++ linux-pm/kernel/sched/cpufreq.c
@@ -5,6 +5,8 @@
  * Copyright (C) 2016, Intel Corporation
  * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
  */
+#include <linux/cpufreq.h>
+
 #include "sched.h"
 
 DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);
@@ -57,3 +59,19 @@ void cpufreq_remove_update_util_hook(int
 	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);
 }
 EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
+
+/**
+ * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated.
+ * @policy: cpufreq policy to check.
+ *
+ * Return 'true' if:
+ * - the local and remote CPUs share @policy,
+ * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is not going
+ *   offline (in which it is not expected to run cpufreq updates any more).
+ */
+bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
+{
+	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
+		(policy->dvfs_possible_from_any_cpu &&
+		 rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
+}
Index: linux-pm/include/linux/sched/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/sched/cpufreq.h
+++ linux-pm/include/linux/sched/cpufreq.h
@@ -12,6 +12,8 @@
 #define SCHED_CPUFREQ_MIGRATION	(1U << 1)
 
 #ifdef CONFIG_CPU_FREQ
+struct cpufreq_policy;
+
 struct update_util_data {
        void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
 };
@@ -20,6 +22,7 @@ void cpufreq_add_update_util_hook(int cp
                        void (*func)(struct update_util_data *data, u64 time,
 				    unsigned int flags));
 void cpufreq_remove_update_util_hook(int cpu);
+bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy);
 
 static inline unsigned long map_util_freq(unsigned long util,
 					unsigned long freq, unsigned long cap)
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -82,12 +82,10 @@ static bool sugov_should_update_freq(str
 	 * by the hardware, as calculating the frequency is pointless if
 	 * we cannot in fact act on it.
 	 *
-	 * For the slow switching platforms, the kthread is always scheduled on
-	 * the right set of CPUs and any CPU can find the next frequency and
-	 * schedule the kthread.
+	 * This is needed on the slow switching platforms too to prevent CPUs
+	 * going offline from leaving stale IRQ work items behind.
 	 */
-	if (sg_policy->policy->fast_switch_enabled &&
-	    !cpufreq_this_cpu_can_update(sg_policy->policy))
+	if (!cpufreq_this_cpu_can_update(sg_policy->policy))
 		return false;
 
 	if (unlikely(sg_policy->limits_changed)) {




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10  8:57                                                     ` Viresh Kumar
@ 2019-12-10 11:03                                                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10 11:03 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Anson Huang, Rafael J. Wysocki, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

On Tuesday, December 10, 2019 9:57:44 AM CET Viresh Kumar wrote:
> On 10-12-19, 09:50, Rafael J. Wysocki wrote:
> > Index: linux-pm/include/linux/cpufreq.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/cpufreq.h
> > +++ linux-pm/include/linux/cpufreq.h
> > @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_
> >  {
> >  	/*
> >  	 * Allow remote callbacks if:
> > -	 * - dvfs_possible_from_any_cpu flag is set
> >  	 * - the local and remote CPUs share cpufreq policy
> > +	 * - dvfs_possible_from_any_cpu flag is set and the CPU running the
> > +	 *   code is not going offline.
> >  	 */
> > -	return policy->dvfs_possible_from_any_cpu ||
> > -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> > +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > +		(policy->dvfs_possible_from_any_cpu &&
> > +		 !cpumask_test_cpu(smp_processor_id(), policy->related_cpus));
> 
> This isn't enough as you are assuming that only a CPU from policy->related_cpus
> can do remote processing. On a ARM platform (like Qcom's Krait, octa-core), all
> 8 CPUs have separate policies as they don't share clock lines. Though they can
> still do remote processing for each other as the clk registers are common.
> 
> Also policy->related_cpus can anyway update frequency for the policy even if
> dvfs_possible_from_any_cpu is set to false.

I know, see

https://lore.kernel.org/linux-pm/CAJZ5v0h0934-VBODZZJ8gEG2byuhQ+bomoCuTmmQZOBtqu5bKQ@mail.gmail.com/T/#mccadfdc557468072ab6c5525601a71d60070e99b




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10 10:54                                                         ` Viresh Kumar
@ 2019-12-10 11:07                                                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-10 11:07 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Anson Huang, Rafael J. Wysocki, Peng Fan,
	Jacky Bai, linux-pm, Vincent Guittot, Peter Zijlstra,
	Paul McKenney

On Tue, Dec 10, 2019 at 11:54 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 10-12-19, 11:39, Rafael J. Wysocki wrote:
> > Index: linux-pm/kernel/sched/cpufreq.c
> > ===================================================================
> > --- linux-pm.orig/kernel/sched/cpufreq.c
> > +++ linux-pm/kernel/sched/cpufreq.c
> > @@ -5,6 +5,8 @@
> >   * Copyright (C) 2016, Intel Corporation
> >   * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >   */
> > +#include <linux/cpufreq.h>
> > +
> >  #include "sched.h"
> >
> >  DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);
> > @@ -57,3 +59,19 @@ void cpufreq_remove_update_util_hook(int
> >       rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
> > +
> > +/**
> > + * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated.
> > + * @policy: cpufreq policy to check.
> > + *
> > + * Return 'true' if:
> > + * - the local and remote CPUs share @policy,
> > + * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is not going
> > + *   offline (in which it is not expected to run cpufreq updates any more).
> > + */
> > +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
> > +{
> > +     return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > +             (policy->dvfs_possible_from_any_cpu &&
>
> > +              rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
>
> I somehow feel that doing this particular check in cpufreq_update_util() maybe
> better. Or maybe we can call cpufreq_this_cpu_can_update() itself right from
> cpufreq_update_util() instead and remove it from multiple places in the
> governors.

First, there are two places actually.

Second, the point is that the presence of the hook only needs to be
checked if dvfs_possible_from_any_cpu is set and checking that in
cpufreq_update_util() would be kind of obnoxious IMO.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10 10:54                                                         ` Rafael J. Wysocki
@ 2019-12-11  5:08                                                           ` Anson Huang
  2019-12-11  8:59                                                           ` Peng Fan
  1 sibling, 0 replies; 57+ messages in thread
From: Anson Huang @ 2019-12-11  5:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Viresh Kumar, Peng Fan, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney



> 
> One more thing.
> 
> Both of the previous patches would not fix the schedutil governor in which
> cpufreq_this_cpu_can_update() only is called in the fast_switch case and that
> is not when irq_works are used.
> 
> So please discard the patch I have just posted and here is an updated patch
> that covers schedutil too, so please test this one instead.
> 
> ---
>  include/linux/cpufreq.h          |   11 -----------
>  include/linux/sched/cpufreq.h    |    3 +++
>  kernel/sched/cpufreq.c           |   18 ++++++++++++++++++
>  kernel/sched/cpufreq_schedutil.c |    8 +++-----
>  4 files changed, 24 insertions(+), 16 deletions(-)
> 
> Index: linux-pm/include/linux/cpufreq.h
> ================================================================
> ===
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -595,17 +595,6 @@ struct governor_attr {
>  			 size_t count);
>  };
> 
> -static inline bool cpufreq_this_cpu_can_update(struct cpufreq_policy
> *policy) -{
> -	/*
> -	 * Allow remote callbacks if:
> -	 * - dvfs_possible_from_any_cpu flag is set
> -	 * - the local and remote CPUs share cpufreq policy
> -	 */
> -	return policy->dvfs_possible_from_any_cpu ||
> -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> -}
> -
> 
> /***************************************************************
> ******
>   *                     FREQUENCY TABLE HELPERS                       *
> 
> ****************************************************************
> *****/
> Index: linux-pm/kernel/sched/cpufreq.c
> ================================================================
> ===
> --- linux-pm.orig/kernel/sched/cpufreq.c
> +++ linux-pm/kernel/sched/cpufreq.c
> @@ -5,6 +5,8 @@
>   * Copyright (C) 2016, Intel Corporation
>   * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>   */
> +#include <linux/cpufreq.h>
> +
>  #include "sched.h"
> 
>  DEFINE_PER_CPU(struct update_util_data __rcu *,
> cpufreq_update_util_data); @@ -57,3 +59,19 @@ void
> cpufreq_remove_update_util_hook(int
>  	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);  }
> EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
> +
> +/**
> + * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated.
> + * @policy: cpufreq policy to check.
> + *
> + * Return 'true' if:
> + * - the local and remote CPUs share @policy,
> + * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is not
> going
> + *   offline (in which it is not expected to run cpufreq updates any more).
> + */
> +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy) {
> +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> +		(policy->dvfs_possible_from_any_cpu &&
> +
> rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
> +}
> Index: linux-pm/include/linux/sched/cpufreq.h
> ================================================================
> ===
> --- linux-pm.orig/include/linux/sched/cpufreq.h
> +++ linux-pm/include/linux/sched/cpufreq.h
> @@ -12,6 +12,8 @@
>  #define SCHED_CPUFREQ_MIGRATION	(1U << 1)
> 
>  #ifdef CONFIG_CPU_FREQ
> +struct cpufreq_policy;
> +
>  struct update_util_data {
>         void (*func)(struct update_util_data *data, u64 time, unsigned int
> flags);  }; @@ -20,6 +22,7 @@ void cpufreq_add_update_util_hook(int cp
>                         void (*func)(struct update_util_data *data, u64 time,
>  				    unsigned int flags));
>  void cpufreq_remove_update_util_hook(int cpu);
> +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy);
> 
>  static inline unsigned long map_util_freq(unsigned long util,
>  					unsigned long freq, unsigned long cap)
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ================================================================
> ===
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -82,12 +82,10 @@ static bool sugov_should_update_freq(str
>  	 * by the hardware, as calculating the frequency is pointless if
>  	 * we cannot in fact act on it.
>  	 *
> -	 * For the slow switching platforms, the kthread is always scheduled
> on
> -	 * the right set of CPUs and any CPU can find the next frequency and
> -	 * schedule the kthread.
> +	 * This is needed on the slow switching platforms too to prevent CPUs
> +	 * going offline from leaving stale IRQ work items behind.
>  	 */
> -	if (sg_policy->policy->fast_switch_enabled &&
> -	    !cpufreq_this_cpu_can_update(sg_policy->policy))
> +	if (!cpufreq_this_cpu_can_update(sg_policy->policy))
>  		return false;
> 
>  	if (unlikely(sg_policy->limits_changed)) {

The is patch is running so far so good on our i.MX8 platforms, both single cluster SoC
and dual clusters SoC, passed 3 hours test (> 5000 iterations) now and I will let it continue
to run for whole day.

And I will add this patch to our internal tree as a hot fix for now.

Thanks everyone a lot for help on this issue!

Anson


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-10 10:54                                                         ` Rafael J. Wysocki
  2019-12-11  5:08                                                           ` Anson Huang
@ 2019-12-11  8:59                                                           ` Peng Fan
  2019-12-11  9:36                                                             ` Rafael J. Wysocki
  1 sibling, 1 reply; 57+ messages in thread
From: Peng Fan @ 2019-12-11  8:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, Anson Huang
  Cc: Rafael J. Wysocki, Viresh Kumar, Jacky Bai, linux-pm,
	Vincent Guittot, Peter Zijlstra, Paul McKenney

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Tuesday, December 10, 2019 11:39:25 AM CET Rafael J. Wysocki wrote:
> > On Tuesday, December 10, 2019 9:51:43 AM CET Anson Huang wrote:
> > >
> > > > -----Original Message-----
> > > > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > Sent: Tuesday, December 10, 2019 4:51 PM
> > > > To: Anson Huang <anson.huang@nxp.com>
> > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Jacky Bai
> > > > <ping.bai@nxp.com>; linux-pm@vger.kernel.org; Vincent Guittot
> > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > <peterz@infradead.org>; Paul McKenney
> <paulmck@linux.vnet.ibm.com>
> > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > driver
> > > >
> > > > On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > Sent: Tuesday, December 10, 2019 4:38 PM
> > > > > > To: Anson Huang <anson.huang@nxp.com>
> > > > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael
> J.
> > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > > > driver
> > > > > >
> > > > > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang
> > > > > > <anson.huang@nxp.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > > > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > > > > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>;
> > > > > > > > Rafael
> > > > J.
> > > > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > > > Subject: Re: About CPU hot-plug stress test failed in
> > > > > > > > cpufreq driver
> > > > > > > >
> > > > > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > > > > > <viresh.kumar@linaro.org>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > +few more guys
> > > > > > > > >
> > > > > > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > > > > > But per
> > > > > > > > > > https://eur01.safelinks.protection.outlook.com/?url=ht
> > > > > > > > > > tps%3A
> > > > > > > > > > %2F%
> > > > > > > > > > 2Fel
> > > > > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f4490
> 0
> > > > > > > > be3404
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
> %
> > > > > > > > 7C637
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > > > > > W%2Faw31
> > > > > > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > > > > > cpu_of(rq) and smp_processor_id() is possible to not
> > > > > > > > > > the same,
> > > > > > > > > >
> > > > > > > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > > > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > > > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > > > > > >
> > > > > > > > > Okay, sorry for the long weekend where I couldn't get
> > > > > > > > > time to reply at
> > > > > > all.
> > > > > > > >
> > > > > > > > No worries. :-)
> > > > > > > >
> > > > > > > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > > > > > > >
> > > > > > > > > Who can update the frequency of a CPU ? For many
> > > > > > > > > architectures/platforms the eventual code that writes to
> > > > > > > > > some register to change the frequency should only run on
> > > > > > > > > the local CPU, as these registers are per-cpu registers
> > > > > > > > > and not something shared
> > > > > > between CPUs.
> > > > > > > > >
> > > > > > > > > But for the ARM architecture, we have a PLL and then
> > > > > > > > > some more registers to play with the clk provided to the
> > > > > > > > > CPU blocks and these registers (which are updated as a
> > > > > > > > > result of
> > > > > > > > > clk_set_rate()) are part of a
> > > > > > > > block outside of the CPU blocks.
> > > > > > > > > And so any CPU (even if it is not part of the same
> > > > > > > > > cpufreq
> > > > > > > > > policy) can update it. Setting this flag allows that and
> > > > > > > > > eventually we may end up updating the frequency sooner,
> > > > > > > > > instead of later (which may be less effective). That was
> > > > > > > > > the idea of
> > > > the remote-wakeup series.
> > > > > > > > > This stuff is absolutely correct and so cpufreq-dt does
> > > > > > > > > it for
> > > > everyone.
> > > > > > > > >
> > > > > > > > > This also means that the normal work and irq-work both
> > > > > > > > > can run on any CPU for your platform and it should be okay to
> do that.
> > > > > > > >
> > > > > > > > And it the failing case all of the CPUs in the system are
> > > > > > > > in the same policy anyway, so dvfs_possible_from_any_cpu is a
> red herring.
> > > > > > > >
> > > > > > > > > Now, we have necessary measures in place to make sure
> > > > > > > > > that after stopping and before starting a governor, the
> > > > > > > > > scheduler hooks to save the cpufreq governor pointer and
> > > > > > > > > updates to
> > > > > > > > > policy->cpus are made properly, to make sure that we
> > > > > > > > > policy->never
> > > > > > > > > ever schedule a work or irq-work on a CPU which is offline.
> > > > > > > > > Now it looks like this isn't working as expected and we
> > > > > > > > > need to find
> > > > what exactly is broken here.
> > > > > > > > >
> > > > > > > > > And yes, I did the testing on Hikey 620, an octa-core
> > > > > > > > > ARM platform which has a single cpufreq policy which has
> > > > > > > > > all the 8 CPUs. And yes, I am using cpufreq-dt only and
> > > > > > > > > I wasn't able to reproduce the problem with mainline kernel as
> I explained earlier.
> > > > > > > > >
> > > > > > > > > The problem is somewhere between the scheduler's
> > > > > > > > > governor hook
> > > > > > > > running
> > > > > > > > > or queuing work on a CPU which is in the middle of
> > > > > > > > > getting offline/online and there is some race around
> > > > > > > > > that. The problem hence may not be related to just
> > > > > > > > > cpufreq, but a wider variety of
> > > > clients.
> > > > > > > >
> > > > > > > > The problem is that a CPU is running a governor hook which
> > > > > > > > it shouldn't be running at all.
> > > > > > > >
> > > > > > > > The observation that dvfs_possible_from_any_cpu makes a
> > > > > > > > difference only means that the governor hook is running on
> > > > > > > > a CPU that is not present in the
> > > > > > > > policy->cpus mask.  On the platform(s) in question this
> > > > > > > > policy->cannot happen as
> > > > > > > > long as RCU works as expected.
> > > > > > >
> > > > > > > If I understand correctly, the governor hook ONLY be clear
> > > > > > > on the CPU being offline and after governor stopped, but the
> > > > > > > CPU being offline could still run into below function to
> > > > > > > help other CPU update the util, and it ONLY checks the
> > > > > > > cpu_of(rq)'s governor hook which is valid as that
> > > > > > CPU is online.
> > > > > > >
> > > > > > > So the question is how to avoid the CPU being offline and
> > > > > > > already finish the governor stop flow be scheduled to help
> > > > > > > other CPU update the
> > > > > > util.
> > > > > > >
> > > > > > >  static inline void cpufreq_update_util(struct rq *rq,
> > > > > > > unsigned int
> > > > > > > flags)  {
> > > > > > >          struct update_util_data *data;
> > > > > > >
> > > > > > >          data =
> > > > > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> > > > > > >
> cpu_of(rq)));
> > > > > > >          if (data)
> > > > > > >                  data->func(data, rq_clock(rq), flags);  }
> > > > > >
> > > > > > OK, so that's where the problem is, good catch!
> > > > > >
> > > > > > So what happens is that a CPU going offline runs some
> > > > > > scheduler code that invokes cpufreq_update_util().
> > > > > > Incidentally, it is not the cpu_of(rq), but that CPU is still
> > > > > > online, so the callback is invoked and then policy->cpus test
> > > > > > is bypassed because of
> > > > dvfs_possible_from_any_cpu.
> > > > >
> > > > > If this is the issue, add another check here for the current
> > > > > CPU's governor
> > > > hook?
> > > > > Or any other better place to make sure the CPU being offline NOT
> > > > > to be
> > > > queued to irq work?
> > > >
> > > > Generally, yes.
> > > >
> > > > Something like the patch below should help if I'm not mistaken:
> > > >
> > > > ---
> > > >  include/linux/cpufreq.h |    8 +++++---
> > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > >
> > > > Index: linux-pm/include/linux/cpufreq.h
> > > >
> ==============================================================
> ==
> > > > ===
> > > > --- linux-pm.orig/include/linux/cpufreq.h
> > > > +++ linux-pm/include/linux/cpufreq.h
> > > > @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_  {
> > > >  	/*
> > > >  	 * Allow remote callbacks if:
> > > > -	 * - dvfs_possible_from_any_cpu flag is set
> > > >  	 * - the local and remote CPUs share cpufreq policy
> > > > +	 * - dvfs_possible_from_any_cpu flag is set and the CPU running
> the
> > > > +	 *   code is not going offline.
> > > >  	 */
> > > > -	return policy->dvfs_possible_from_any_cpu ||
> > > > -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> > > > +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > > > +		(policy->dvfs_possible_from_any_cpu &&
> > > > +		 !cpumask_test_cpu(smp_processor_id(), policy-
> > > > >related_cpus));
> > > >  }
> > >
> > > I will start a stress test of this patch, thanks!
> >
> > OK, thanks!
> >
> > Another patch to test is appended and it should be more robust.
> >
> > Instead of doing the related_cpus cpumask check in the previous patch
> > (which only covered CPUs that belog to the target policy) it checks if
> > the update_util hook is set for the local CPU (if it is not, that CPU
> > is not expected to run the uodate_util code).
> 
> One more thing.
> 
> Both of the previous patches would not fix the schedutil governor in which
> cpufreq_this_cpu_can_update() only is called in the fast_switch case and that
> is not when irq_works are used.
> 
> So please discard the patch I have just posted and here is an updated patch
> that covers schedutil too, so please test this one instead.
> 
> ---
>  include/linux/cpufreq.h          |   11 -----------
>  include/linux/sched/cpufreq.h    |    3 +++
>  kernel/sched/cpufreq.c           |   18 ++++++++++++++++++
>  kernel/sched/cpufreq_schedutil.c |    8 +++-----
>  4 files changed, 24 insertions(+), 16 deletions(-)
> 
> Index: linux-pm/include/linux/cpufreq.h
> ==============================================================
> =====
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -595,17 +595,6 @@ struct governor_attr {
>  			 size_t count);
>  };
> 
> -static inline bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
> -{
> -	/*
> -	 * Allow remote callbacks if:
> -	 * - dvfs_possible_from_any_cpu flag is set
> -	 * - the local and remote CPUs share cpufreq policy
> -	 */
> -	return policy->dvfs_possible_from_any_cpu ||
> -		cpumask_test_cpu(smp_processor_id(), policy->cpus);
> -}
> -
> 
> /*************************************************************
> ********
>   *                     FREQUENCY TABLE HELPERS
> *
> 
> **************************************************************
> *******/
> Index: linux-pm/kernel/sched/cpufreq.c
> ==============================================================
> =====
> --- linux-pm.orig/kernel/sched/cpufreq.c
> +++ linux-pm/kernel/sched/cpufreq.c
> @@ -5,6 +5,8 @@
>   * Copyright (C) 2016, Intel Corporation
>   * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>   */
> +#include <linux/cpufreq.h>
> +
>  #include "sched.h"
> 
>  DEFINE_PER_CPU(struct update_util_data __rcu *,
> cpufreq_update_util_data); @@ -57,3 +59,19 @@ void
> cpufreq_remove_update_util_hook(int
>  	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);  }
> EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
> +
> +/**
> + * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated.
> + * @policy: cpufreq policy to check.
> + *
> + * Return 'true' if:
> + * - the local and remote CPUs share @policy,
> + * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is not
> going
> + *   offline (in which it is not expected to run cpufreq updates any more).
> + */
> +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy) {
> +	return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> +		(policy->dvfs_possible_from_any_cpu &&
> +
> rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
> +}
> Index: linux-pm/include/linux/sched/cpufreq.h
> ==============================================================
> =====
> --- linux-pm.orig/include/linux/sched/cpufreq.h
> +++ linux-pm/include/linux/sched/cpufreq.h
> @@ -12,6 +12,8 @@
>  #define SCHED_CPUFREQ_MIGRATION	(1U << 1)
> 
>  #ifdef CONFIG_CPU_FREQ
> +struct cpufreq_policy;
> +
>  struct update_util_data {
>         void (*func)(struct update_util_data *data, u64 time, unsigned int
> flags);  }; @@ -20,6 +22,7 @@ void cpufreq_add_update_util_hook(int cp
>                         void (*func)(struct update_util_data *data, u64
> time,
>  				    unsigned int flags));
>  void cpufreq_remove_update_util_hook(int cpu);
> +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy);
> 
>  static inline unsigned long map_util_freq(unsigned long util,
>  					unsigned long freq, unsigned long cap)
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ==============================================================
> =====
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -82,12 +82,10 @@ static bool sugov_should_update_freq(str
>  	 * by the hardware, as calculating the frequency is pointless if
>  	 * we cannot in fact act on it.
>  	 *
> -	 * For the slow switching platforms, the kthread is always scheduled on
> -	 * the right set of CPUs and any CPU can find the next frequency and
> -	 * schedule the kthread.
> +	 * This is needed on the slow switching platforms too to prevent CPUs
> +	 * going offline from leaving stale IRQ work items behind.
>  	 */
> -	if (sg_policy->policy->fast_switch_enabled &&
> -	    !cpufreq_this_cpu_can_update(sg_policy->policy))
> +	if (!cpufreq_this_cpu_can_update(sg_policy->policy))
>  		return false;
> 
>  	if (unlikely(sg_policy->limits_changed)) {

So we will not queue irq_work on the offlining CPU in your patch.

When we met issue on CPU3, it is CPU3 has irq_work pending,
but the SGI IRQ_WORK interrupt is not handled because irq is
always disabled, see stack in idle irq disabled state.

[  227.344678] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.4.0-03554-gbb1159fa5556-dirty #95
[  227.344682] Hardware name: Freescale i.MX8QXP MEK (DT)
[  227.344686] Call trace:
[  227.344701]  dump_backtrace+0x0/0x140
[  227.344708]  show_stack+0x14/0x20
[  227.344717]  dump_stack+0xb4/0xf8
[  227.344730]  dbs_update_util_handler+0x150/0x180
[  227.344739]  update_load_avg+0x38c/0x3c8
[  227.344746]  enqueue_task_fair+0xcc/0x3a0
[  227.344756]  activate_task+0x5c/0xa0
[  227.344766]  ttwu_do_activate.isra.0+0x4c/0x70
[  227.344776]  try_to_wake_up+0x2d8/0x410
[  227.344786]  wake_up_process+0x14/0x20
[  227.344794]  swake_up_locked.part.0+0x18/0x38
[  227.344801]  swake_up_one+0x30/0x48
[  227.344808]  rcu_gp_kthread_wake+0x5c/0x80
[  227.344815]  rcu_report_qs_rsp+0x40/0x50
[  227.344825]  rcu_report_qs_rnp+0x120/0x148
[  227.344832]  rcu_report_dead+0x120/0x130
[  227.344841]  cpuhp_report_idle_dead+0x3c/0x80
[  227.344847]  do_idle+0x198/0x280
[  227.344856]  cpu_startup_entry+0x24/0x40
[  227.344865]  secondary_start_kernel+0x154/0x190
[  227.344905] CPU3: shutdown
[  227.444015] psci: CPU3 killed.


I also met CPU1 offlining have irq_work queued, but CPU1
not trigger issue, because SGI IRQ_WORK interrupt is handled.

There are multiple path to run into dbs_update_util_handler
irq_work_queue, the path might will enable interrupt, might not.

Seem do_idle is the only path that will trigger cpu_die for HOTPLUG
ARM/ARM64.

So do we need to use idle as flag to queue irq_work or not?
In this way, we could still inject irq work on offlining/offline cpu, until
it runs into idle to cpu_die.

Thanks,
Peng.

> 
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-11  8:59                                                           ` Peng Fan
@ 2019-12-11  9:36                                                             ` Rafael J. Wysocki
  2019-12-11  9:43                                                               ` Peng Fan
  0 siblings, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-11  9:36 UTC (permalink / raw)
  To: Peng Fan
  Cc: Rafael J. Wysocki, Anson Huang, Rafael J. Wysocki, Viresh Kumar,
	Jacky Bai, linux-pm, Vincent Guittot, Peter Zijlstra,
	Paul McKenney

On Wed, Dec 11, 2019 at 9:59 AM Peng Fan <peng.fan@nxp.com> wrote:
>
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> >
> > On Tuesday, December 10, 2019 11:39:25 AM CET Rafael J. Wysocki wrote:
> > > On Tuesday, December 10, 2019 9:51:43 AM CET Anson Huang wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > > Sent: Tuesday, December 10, 2019 4:51 PM
> > > > > To: Anson Huang <anson.huang@nxp.com>
> > > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Jacky Bai
> > > > > <ping.bai@nxp.com>; linux-pm@vger.kernel.org; Vincent Guittot
> > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > <peterz@infradead.org>; Paul McKenney
> > <paulmck@linux.vnet.ibm.com>
> > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > > driver
> > > > >
> > > > > On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang wrote:
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > > Sent: Tuesday, December 10, 2019 4:38 PM
> > > > > > > To: Anson Huang <anson.huang@nxp.com>
> > > > > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Rafael
> > J.
> > > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > > > > driver
> > > > > > >
> > > > > > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang
> > > > > > > <anson.huang@nxp.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > > > > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > > > > > > <rafael@kernel.org>; Anson Huang <anson.huang@nxp.com>;
> > > > > > > > > Rafael
> > > > > J.
> > > > > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > > > > Subject: Re: About CPU hot-plug stress test failed in
> > > > > > > > > cpufreq driver
> > > > > > > > >
> > > > > > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > > > > > > <viresh.kumar@linaro.org>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > +few more guys
> > > > > > > > > >
> > > > > > > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > > > > > > But per
> > > > > > > > > > > https://eur01.safelinks.protection.outlook.com/?url=ht
> > > > > > > > > > > tps%3A
> > > > > > > > > > > %2F%
> > > > > > > > > > > 2Fel
> > > > > > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > > > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f4490
> > 0
> > > > > > > > > be3404
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
> > %
> > > > > > > > > 7C637
> > > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > >
> > 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > > > > > > W%2Faw31
> > > > > > > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > > > > > > cpu_of(rq) and smp_processor_id() is possible to not
> > > > > > > > > > > the same,
> > > > > > > > > > >
> > > > > > > > > > > When cpu_of(rq) is not equal to smp_processor_id(),
> > > > > > > > > > > dbs_update_util_handler will use irq_work_queue to
> > > > > > > > > > > smp_processor_id(), not cpu_of(rq). Is this expected?
> > > > > > > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > > > > > > >
> > > > > > > > > > Okay, sorry for the long weekend where I couldn't get
> > > > > > > > > > time to reply at
> > > > > > > all.
> > > > > > > > >
> > > > > > > > > No worries. :-)
> > > > > > > > >
> > > > > > > > > > First of all, lets try to understand dvfs_possible_from_any_cpu.
> > > > > > > > > >
> > > > > > > > > > Who can update the frequency of a CPU ? For many
> > > > > > > > > > architectures/platforms the eventual code that writes to
> > > > > > > > > > some register to change the frequency should only run on
> > > > > > > > > > the local CPU, as these registers are per-cpu registers
> > > > > > > > > > and not something shared
> > > > > > > between CPUs.
> > > > > > > > > >
> > > > > > > > > > But for the ARM architecture, we have a PLL and then
> > > > > > > > > > some more registers to play with the clk provided to the
> > > > > > > > > > CPU blocks and these registers (which are updated as a
> > > > > > > > > > result of
> > > > > > > > > > clk_set_rate()) are part of a
> > > > > > > > > block outside of the CPU blocks.
> > > > > > > > > > And so any CPU (even if it is not part of the same
> > > > > > > > > > cpufreq
> > > > > > > > > > policy) can update it. Setting this flag allows that and
> > > > > > > > > > eventually we may end up updating the frequency sooner,
> > > > > > > > > > instead of later (which may be less effective). That was
> > > > > > > > > > the idea of
> > > > > the remote-wakeup series.
> > > > > > > > > > This stuff is absolutely correct and so cpufreq-dt does
> > > > > > > > > > it for
> > > > > everyone.
> > > > > > > > > >
> > > > > > > > > > This also means that the normal work and irq-work both
> > > > > > > > > > can run on any CPU for your platform and it should be okay to
> > do that.
> > > > > > > > >
> > > > > > > > > And it the failing case all of the CPUs in the system are
> > > > > > > > > in the same policy anyway, so dvfs_possible_from_any_cpu is a
> > red herring.
> > > > > > > > >
> > > > > > > > > > Now, we have necessary measures in place to make sure
> > > > > > > > > > that after stopping and before starting a governor, the
> > > > > > > > > > scheduler hooks to save the cpufreq governor pointer and
> > > > > > > > > > updates to
> > > > > > > > > > policy->cpus are made properly, to make sure that we
> > > > > > > > > > policy->never
> > > > > > > > > > ever schedule a work or irq-work on a CPU which is offline.
> > > > > > > > > > Now it looks like this isn't working as expected and we
> > > > > > > > > > need to find
> > > > > what exactly is broken here.
> > > > > > > > > >
> > > > > > > > > > And yes, I did the testing on Hikey 620, an octa-core
> > > > > > > > > > ARM platform which has a single cpufreq policy which has
> > > > > > > > > > all the 8 CPUs. And yes, I am using cpufreq-dt only and
> > > > > > > > > > I wasn't able to reproduce the problem with mainline kernel as
> > I explained earlier.
> > > > > > > > > >
> > > > > > > > > > The problem is somewhere between the scheduler's
> > > > > > > > > > governor hook
> > > > > > > > > running
> > > > > > > > > > or queuing work on a CPU which is in the middle of
> > > > > > > > > > getting offline/online and there is some race around
> > > > > > > > > > that. The problem hence may not be related to just
> > > > > > > > > > cpufreq, but a wider variety of
> > > > > clients.
> > > > > > > > >
> > > > > > > > > The problem is that a CPU is running a governor hook which
> > > > > > > > > it shouldn't be running at all.
> > > > > > > > >
> > > > > > > > > The observation that dvfs_possible_from_any_cpu makes a
> > > > > > > > > difference only means that the governor hook is running on
> > > > > > > > > a CPU that is not present in the
> > > > > > > > > policy->cpus mask.  On the platform(s) in question this
> > > > > > > > > policy->cannot happen as
> > > > > > > > > long as RCU works as expected.
> > > > > > > >
> > > > > > > > If I understand correctly, the governor hook ONLY be clear
> > > > > > > > on the CPU being offline and after governor stopped, but the
> > > > > > > > CPU being offline could still run into below function to
> > > > > > > > help other CPU update the util, and it ONLY checks the
> > > > > > > > cpu_of(rq)'s governor hook which is valid as that
> > > > > > > CPU is online.
> > > > > > > >
> > > > > > > > So the question is how to avoid the CPU being offline and
> > > > > > > > already finish the governor stop flow be scheduled to help
> > > > > > > > other CPU update the
> > > > > > > util.
> > > > > > > >
> > > > > > > >  static inline void cpufreq_update_util(struct rq *rq,
> > > > > > > > unsigned int
> > > > > > > > flags)  {
> > > > > > > >          struct update_util_data *data;
> > > > > > > >
> > > > > > > >          data =
> > > > > > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_data,
> > > > > > > >
> > cpu_of(rq)));
> > > > > > > >          if (data)
> > > > > > > >                  data->func(data, rq_clock(rq), flags);  }
> > > > > > >
> > > > > > > OK, so that's where the problem is, good catch!
> > > > > > >
> > > > > > > So what happens is that a CPU going offline runs some
> > > > > > > scheduler code that invokes cpufreq_update_util().
> > > > > > > Incidentally, it is not the cpu_of(rq), but that CPU is still
> > > > > > > online, so the callback is invoked and then policy->cpus test
> > > > > > > is bypassed because of
> > > > > dvfs_possible_from_any_cpu.
> > > > > >
> > > > > > If this is the issue, add another check here for the current
> > > > > > CPU's governor
> > > > > hook?
> > > > > > Or any other better place to make sure the CPU being offline NOT
> > > > > > to be
> > > > > queued to irq work?
> > > > >
> > > > > Generally, yes.
> > > > >
> > > > > Something like the patch below should help if I'm not mistaken:
> > > > >
> > > > > ---
> > > > >  include/linux/cpufreq.h |    8 +++++---
> > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > >
> > > > > Index: linux-pm/include/linux/cpufreq.h
> > > > >
> > ==============================================================
> > ==
> > > > > ===
> > > > > --- linux-pm.orig/include/linux/cpufreq.h
> > > > > +++ linux-pm/include/linux/cpufreq.h
> > > > > @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_  {
> > > > >         /*
> > > > >          * Allow remote callbacks if:
> > > > > -        * - dvfs_possible_from_any_cpu flag is set
> > > > >          * - the local and remote CPUs share cpufreq policy
> > > > > +        * - dvfs_possible_from_any_cpu flag is set and the CPU running
> > the
> > > > > +        *   code is not going offline.
> > > > >          */
> > > > > -       return policy->dvfs_possible_from_any_cpu ||
> > > > > -               cpumask_test_cpu(smp_processor_id(), policy->cpus);
> > > > > +       return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > > > > +               (policy->dvfs_possible_from_any_cpu &&
> > > > > +                !cpumask_test_cpu(smp_processor_id(), policy-
> > > > > >related_cpus));
> > > > >  }
> > > >
> > > > I will start a stress test of this patch, thanks!
> > >
> > > OK, thanks!
> > >
> > > Another patch to test is appended and it should be more robust.
> > >
> > > Instead of doing the related_cpus cpumask check in the previous patch
> > > (which only covered CPUs that belog to the target policy) it checks if
> > > the update_util hook is set for the local CPU (if it is not, that CPU
> > > is not expected to run the uodate_util code).
> >
> > One more thing.
> >
> > Both of the previous patches would not fix the schedutil governor in which
> > cpufreq_this_cpu_can_update() only is called in the fast_switch case and that
> > is not when irq_works are used.
> >
> > So please discard the patch I have just posted and here is an updated patch
> > that covers schedutil too, so please test this one instead.
> >
> > ---
> >  include/linux/cpufreq.h          |   11 -----------
> >  include/linux/sched/cpufreq.h    |    3 +++
> >  kernel/sched/cpufreq.c           |   18 ++++++++++++++++++
> >  kernel/sched/cpufreq_schedutil.c |    8 +++-----
> >  4 files changed, 24 insertions(+), 16 deletions(-)
> >
> > Index: linux-pm/include/linux/cpufreq.h
> > ==============================================================
> > =====
> > --- linux-pm.orig/include/linux/cpufreq.h
> > +++ linux-pm/include/linux/cpufreq.h
> > @@ -595,17 +595,6 @@ struct governor_attr {
> >                        size_t count);
> >  };
> >
> > -static inline bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy)
> > -{
> > -     /*
> > -      * Allow remote callbacks if:
> > -      * - dvfs_possible_from_any_cpu flag is set
> > -      * - the local and remote CPUs share cpufreq policy
> > -      */
> > -     return policy->dvfs_possible_from_any_cpu ||
> > -             cpumask_test_cpu(smp_processor_id(), policy->cpus);
> > -}
> > -
> >
> > /*************************************************************
> > ********
> >   *                     FREQUENCY TABLE HELPERS
> > *
> >
> > **************************************************************
> > *******/
> > Index: linux-pm/kernel/sched/cpufreq.c
> > ==============================================================
> > =====
> > --- linux-pm.orig/kernel/sched/cpufreq.c
> > +++ linux-pm/kernel/sched/cpufreq.c
> > @@ -5,6 +5,8 @@
> >   * Copyright (C) 2016, Intel Corporation
> >   * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >   */
> > +#include <linux/cpufreq.h>
> > +
> >  #include "sched.h"
> >
> >  DEFINE_PER_CPU(struct update_util_data __rcu *,
> > cpufreq_update_util_data); @@ -57,3 +59,19 @@ void
> > cpufreq_remove_update_util_hook(int
> >       rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), NULL);  }
> > EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
> > +
> > +/**
> > + * cpufreq_this_cpu_can_update - Check if cpufreq policy can be updated.
> > + * @policy: cpufreq policy to check.
> > + *
> > + * Return 'true' if:
> > + * - the local and remote CPUs share @policy,
> > + * - dvfs_possible_from_any_cpu is set in @policy and the local CPU is not
> > going
> > + *   offline (in which it is not expected to run cpufreq updates any more).
> > + */
> > +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy) {
> > +     return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > +             (policy->dvfs_possible_from_any_cpu &&
> > +
> > rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
> > +}
> > Index: linux-pm/include/linux/sched/cpufreq.h
> > ==============================================================
> > =====
> > --- linux-pm.orig/include/linux/sched/cpufreq.h
> > +++ linux-pm/include/linux/sched/cpufreq.h
> > @@ -12,6 +12,8 @@
> >  #define SCHED_CPUFREQ_MIGRATION      (1U << 1)
> >
> >  #ifdef CONFIG_CPU_FREQ
> > +struct cpufreq_policy;
> > +
> >  struct update_util_data {
> >         void (*func)(struct update_util_data *data, u64 time, unsigned int
> > flags);  }; @@ -20,6 +22,7 @@ void cpufreq_add_update_util_hook(int cp
> >                         void (*func)(struct update_util_data *data, u64
> > time,
> >                                   unsigned int flags));
> >  void cpufreq_remove_update_util_hook(int cpu);
> > +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy);
> >
> >  static inline unsigned long map_util_freq(unsigned long util,
> >                                       unsigned long freq, unsigned long cap)
> > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > ==============================================================
> > =====
> > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > @@ -82,12 +82,10 @@ static bool sugov_should_update_freq(str
> >        * by the hardware, as calculating the frequency is pointless if
> >        * we cannot in fact act on it.
> >        *
> > -      * For the slow switching platforms, the kthread is always scheduled on
> > -      * the right set of CPUs and any CPU can find the next frequency and
> > -      * schedule the kthread.
> > +      * This is needed on the slow switching platforms too to prevent CPUs
> > +      * going offline from leaving stale IRQ work items behind.
> >        */
> > -     if (sg_policy->policy->fast_switch_enabled &&
> > -         !cpufreq_this_cpu_can_update(sg_policy->policy))
> > +     if (!cpufreq_this_cpu_can_update(sg_policy->policy))
> >               return false;
> >
> >       if (unlikely(sg_policy->limits_changed)) {
>
> So we will not queue irq_work on the offlining CPU in your patch.
>
> When we met issue on CPU3, it is CPU3 has irq_work pending,
> but the SGI IRQ_WORK interrupt is not handled because irq is
> always disabled, see stack in idle irq disabled state.
>
> [  227.344678] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.4.0-03554-gbb1159fa5556-dirty #95
> [  227.344682] Hardware name: Freescale i.MX8QXP MEK (DT)
> [  227.344686] Call trace:
> [  227.344701]  dump_backtrace+0x0/0x140
> [  227.344708]  show_stack+0x14/0x20
> [  227.344717]  dump_stack+0xb4/0xf8
> [  227.344730]  dbs_update_util_handler+0x150/0x180
> [  227.344739]  update_load_avg+0x38c/0x3c8
> [  227.344746]  enqueue_task_fair+0xcc/0x3a0
> [  227.344756]  activate_task+0x5c/0xa0
> [  227.344766]  ttwu_do_activate.isra.0+0x4c/0x70
> [  227.344776]  try_to_wake_up+0x2d8/0x410
> [  227.344786]  wake_up_process+0x14/0x20
> [  227.344794]  swake_up_locked.part.0+0x18/0x38
> [  227.344801]  swake_up_one+0x30/0x48
> [  227.344808]  rcu_gp_kthread_wake+0x5c/0x80
> [  227.344815]  rcu_report_qs_rsp+0x40/0x50
> [  227.344825]  rcu_report_qs_rnp+0x120/0x148
> [  227.344832]  rcu_report_dead+0x120/0x130
> [  227.344841]  cpuhp_report_idle_dead+0x3c/0x80
> [  227.344847]  do_idle+0x198/0x280
> [  227.344856]  cpu_startup_entry+0x24/0x40
> [  227.344865]  secondary_start_kernel+0x154/0x190
> [  227.344905] CPU3: shutdown
> [  227.444015] psci: CPU3 killed.
>
>
> I also met CPU1 offlining have irq_work queued, but CPU1
> not trigger issue, because SGI IRQ_WORK interrupt is handled.
>
> There are multiple path to run into dbs_update_util_handler
> irq_work_queue, the path might will enable interrupt, might not.
>
> Seem do_idle is the only path that will trigger cpu_die for HOTPLUG
> ARM/ARM64.
>
> So do we need to use idle as flag to queue irq_work or not?
> In this way, we could still inject irq work on offlining/offline cpu, until
> it runs into idle to cpu_die.

To be honest, I'm not sure what you mean.

In cpufreq we cannot just avoid queuing up an IRQ work, because we've
already made changes in preparation for it to run.  We need to decide
whether or not to carry out the entire utilization update upfront.

But preventing CPUs with NULL cpufreq_update_util_data pointers from
running cpufreq utilization update code at all (like in the last
patch) should be sufficient to address this problem entirely.  At
least I don't see why not.

Thanks!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-11  9:36                                                             ` Rafael J. Wysocki
@ 2019-12-11  9:43                                                               ` Peng Fan
  2019-12-11  9:52                                                                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 57+ messages in thread
From: Peng Fan @ 2019-12-11  9:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Anson Huang, Viresh Kumar, Jacky Bai,
	linux-pm, Vincent Guittot, Peter Zijlstra, Paul McKenney

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Wed, Dec 11, 2019 at 9:59 AM Peng Fan <peng.fan@nxp.com> wrote:
> >
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > On Tuesday, December 10, 2019 11:39:25 AM CET Rafael J. Wysocki
> wrote:
> > > > On Tuesday, December 10, 2019 9:51:43 AM CET Anson Huang wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Rafael J. Wysocki <rjw@rjwysocki.net>
> > > > > > Sent: Tuesday, December 10, 2019 4:51 PM
> > > > > > To: Anson Huang <anson.huang@nxp.com>
> > > > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>; Jacky
> > > > > > Bai <ping.bai@nxp.com>; linux-pm@vger.kernel.org; Vincent
> > > > > > Guittot <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > <peterz@infradead.org>; Paul McKenney
> > > <paulmck@linux.vnet.ibm.com>
> > > > > > Subject: Re: About CPU hot-plug stress test failed in cpufreq
> > > > > > driver
> > > > > >
> > > > > > On Tuesday, December 10, 2019 9:45:09 AM CET Anson Huang
> wrote:
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > > > Sent: Tuesday, December 10, 2019 4:38 PM
> > > > > > > > To: Anson Huang <anson.huang@nxp.com>
> > > > > > > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Viresh Kumar
> > > > > > > > <viresh.kumar@linaro.org>; Peng Fan <peng.fan@nxp.com>;
> > > > > > > > Rafael
> > > J.
> > > > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai <ping.bai@nxp.com>;
> > > > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > > > Subject: Re: About CPU hot-plug stress test failed in
> > > > > > > > cpufreq driver
> > > > > > > >
> > > > > > > > On Tue, Dec 10, 2019 at 9:29 AM Anson Huang
> > > > > > > > <anson.huang@nxp.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Rafael J. Wysocki <rafael@kernel.org>
> > > > > > > > > > Sent: Tuesday, December 10, 2019 4:22 PM
> > > > > > > > > > To: Viresh Kumar <viresh.kumar@linaro.org>
> > > > > > > > > > Cc: Peng Fan <peng.fan@nxp.com>; Rafael J. Wysocki
> > > > > > > > > > <rafael@kernel.org>; Anson Huang
> > > > > > > > > > <anson.huang@nxp.com>; Rafael
> > > > > > J.
> > > > > > > > > > Wysocki <rjw@rjwysocki.net>; Jacky Bai
> > > > > > > > > > <ping.bai@nxp.com>;
> > > > > > > > > > linux- pm@vger.kernel.org; Vincent Guittot
> > > > > > > > > > <vincent.guittot@linaro.org>; Peter Zijlstra
> > > > > > > > > > <peterz@infradead.org>; Paul McKenney
> > > > > > > > > > <paulmck@linux.vnet.ibm.com>
> > > > > > > > > > Subject: Re: About CPU hot-plug stress test failed in
> > > > > > > > > > cpufreq driver
> > > > > > > > > >
> > > > > > > > > > On Tue, Dec 10, 2019 at 8:05 AM Viresh Kumar
> > > > > > > > > > <viresh.kumar@linaro.org>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > +few more guys
> > > > > > > > > > >
> > > > > > > > > > > On 10-12-19, 05:53, Peng Fan wrote:
> > > > > > > > > > > > But per
> > > > > > > > > > > > https://eur01.safelinks.protection.outlook.com/?ur
> > > > > > > > > > > > l=ht
> > > > > > > > > > > > tps%3A
> > > > > > > > > > > > %2F%
> > > > > > > > > > > > 2Fel
> > > > > > > > > > > > ixir.bootlin.com%2Flinux%2Fv5.5-
> > > > > > > > > > rc1%2Fsource%2Fkernel%2Fsched%2Fsche
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> d.h%23L2293&amp;data=02%7C01%7Canson.huang%40nxp.com%7C6f4490
> > > 0
> > > > > > > > > > be3404
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> e7d355708d77d4a16fa%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0
> > > %
> > > > > > > > > > 7C637
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > >
> 115629475456329&amp;sdata=XXhwvuTOBb3TLmerwkr1zKbaWNA8xA%2Bl
> > > > > > > > > > W%2Faw31
> > > > > > > > > > > > 0AYcM%3D&amp;reserved=0
> > > > > > > > > > > > cpu_of(rq) and smp_processor_id() is possible to
> > > > > > > > > > > > not the same,
> > > > > > > > > > > >
> > > > > > > > > > > > When cpu_of(rq) is not equal to
> > > > > > > > > > > > smp_processor_id(), dbs_update_util_handler will
> > > > > > > > > > > > use irq_work_queue to smp_processor_id(), not
> cpu_of(rq). Is this expected?
> > > > > > > > > > > > Or should the irq_work be queued to cpu_of(rq)?
> > > > > > > > > > >
> > > > > > > > > > > Okay, sorry for the long weekend where I couldn't
> > > > > > > > > > > get time to reply at
> > > > > > > > all.
> > > > > > > > > >
> > > > > > > > > > No worries. :-)
> > > > > > > > > >
> > > > > > > > > > > First of all, lets try to understand
> dvfs_possible_from_any_cpu.
> > > > > > > > > > >
> > > > > > > > > > > Who can update the frequency of a CPU ? For many
> > > > > > > > > > > architectures/platforms the eventual code that
> > > > > > > > > > > writes to some register to change the frequency
> > > > > > > > > > > should only run on the local CPU, as these registers
> > > > > > > > > > > are per-cpu registers and not something shared
> > > > > > > > between CPUs.
> > > > > > > > > > >
> > > > > > > > > > > But for the ARM architecture, we have a PLL and then
> > > > > > > > > > > some more registers to play with the clk provided to
> > > > > > > > > > > the CPU blocks and these registers (which are
> > > > > > > > > > > updated as a result of
> > > > > > > > > > > clk_set_rate()) are part of a
> > > > > > > > > > block outside of the CPU blocks.
> > > > > > > > > > > And so any CPU (even if it is not part of the same
> > > > > > > > > > > cpufreq
> > > > > > > > > > > policy) can update it. Setting this flag allows that
> > > > > > > > > > > and eventually we may end up updating the frequency
> > > > > > > > > > > sooner, instead of later (which may be less
> > > > > > > > > > > effective). That was the idea of
> > > > > > the remote-wakeup series.
> > > > > > > > > > > This stuff is absolutely correct and so cpufreq-dt
> > > > > > > > > > > does it for
> > > > > > everyone.
> > > > > > > > > > >
> > > > > > > > > > > This also means that the normal work and irq-work
> > > > > > > > > > > both can run on any CPU for your platform and it
> > > > > > > > > > > should be okay to
> > > do that.
> > > > > > > > > >
> > > > > > > > > > And it the failing case all of the CPUs in the system
> > > > > > > > > > are in the same policy anyway, so
> > > > > > > > > > dvfs_possible_from_any_cpu is a
> > > red herring.
> > > > > > > > > >
> > > > > > > > > > > Now, we have necessary measures in place to make
> > > > > > > > > > > sure that after stopping and before starting a
> > > > > > > > > > > governor, the scheduler hooks to save the cpufreq
> > > > > > > > > > > governor pointer and updates to
> > > > > > > > > > > policy->cpus are made properly, to make sure that we
> > > > > > > > > > > policy->never
> > > > > > > > > > > ever schedule a work or irq-work on a CPU which is offline.
> > > > > > > > > > > Now it looks like this isn't working as expected and
> > > > > > > > > > > we need to find
> > > > > > what exactly is broken here.
> > > > > > > > > > >
> > > > > > > > > > > And yes, I did the testing on Hikey 620, an
> > > > > > > > > > > octa-core ARM platform which has a single cpufreq
> > > > > > > > > > > policy which has all the 8 CPUs. And yes, I am using
> > > > > > > > > > > cpufreq-dt only and I wasn't able to reproduce the
> > > > > > > > > > > problem with mainline kernel as
> > > I explained earlier.
> > > > > > > > > > >
> > > > > > > > > > > The problem is somewhere between the scheduler's
> > > > > > > > > > > governor hook
> > > > > > > > > > running
> > > > > > > > > > > or queuing work on a CPU which is in the middle of
> > > > > > > > > > > getting offline/online and there is some race around
> > > > > > > > > > > that. The problem hence may not be related to just
> > > > > > > > > > > cpufreq, but a wider variety of
> > > > > > clients.
> > > > > > > > > >
> > > > > > > > > > The problem is that a CPU is running a governor hook
> > > > > > > > > > which it shouldn't be running at all.
> > > > > > > > > >
> > > > > > > > > > The observation that dvfs_possible_from_any_cpu makes
> > > > > > > > > > a difference only means that the governor hook is
> > > > > > > > > > running on a CPU that is not present in the
> > > > > > > > > > policy->cpus mask.  On the platform(s) in question
> > > > > > > > > > policy->this cannot happen as
> > > > > > > > > > long as RCU works as expected.
> > > > > > > > >
> > > > > > > > > If I understand correctly, the governor hook ONLY be
> > > > > > > > > clear on the CPU being offline and after governor
> > > > > > > > > stopped, but the CPU being offline could still run into
> > > > > > > > > below function to help other CPU update the util, and it
> > > > > > > > > ONLY checks the cpu_of(rq)'s governor hook which is
> > > > > > > > > valid as that
> > > > > > > > CPU is online.
> > > > > > > > >
> > > > > > > > > So the question is how to avoid the CPU being offline
> > > > > > > > > and already finish the governor stop flow be scheduled
> > > > > > > > > to help other CPU update the
> > > > > > > > util.
> > > > > > > > >
> > > > > > > > >  static inline void cpufreq_update_util(struct rq *rq,
> > > > > > > > > unsigned int
> > > > > > > > > flags)  {
> > > > > > > > >          struct update_util_data *data;
> > > > > > > > >
> > > > > > > > >          data =
> > > > > > > > rcu_dereference_sched(*per_cpu_ptr(&cpufreq_update_util_da
> > > > > > > > ta,
> > > > > > > > >
> > > cpu_of(rq)));
> > > > > > > > >          if (data)
> > > > > > > > >                  data->func(data, rq_clock(rq), flags);
> > > > > > > > > }
> > > > > > > >
> > > > > > > > OK, so that's where the problem is, good catch!
> > > > > > > >
> > > > > > > > So what happens is that a CPU going offline runs some
> > > > > > > > scheduler code that invokes cpufreq_update_util().
> > > > > > > > Incidentally, it is not the cpu_of(rq), but that CPU is
> > > > > > > > still online, so the callback is invoked and then
> > > > > > > > policy->cpus test is bypassed because of
> > > > > > dvfs_possible_from_any_cpu.
> > > > > > >
> > > > > > > If this is the issue, add another check here for the current
> > > > > > > CPU's governor
> > > > > > hook?
> > > > > > > Or any other better place to make sure the CPU being offline
> > > > > > > NOT to be
> > > > > > queued to irq work?
> > > > > >
> > > > > > Generally, yes.
> > > > > >
> > > > > > Something like the patch below should help if I'm not mistaken:
> > > > > >
> > > > > > ---
> > > > > >  include/linux/cpufreq.h |    8 +++++---
> > > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > Index: linux-pm/include/linux/cpufreq.h
> > > > > >
> > >
> ==============================================================
> > > ==
> > > > > > ===
> > > > > > --- linux-pm.orig/include/linux/cpufreq.h
> > > > > > +++ linux-pm/include/linux/cpufreq.h
> > > > > > @@ -599,11 +599,13 @@ static inline bool cpufreq_this_cpu_can_
> {
> > > > > >         /*
> > > > > >          * Allow remote callbacks if:
> > > > > > -        * - dvfs_possible_from_any_cpu flag is set
> > > > > >          * - the local and remote CPUs share cpufreq policy
> > > > > > +        * - dvfs_possible_from_any_cpu flag is set and the
> > > > > > + CPU running
> > > the
> > > > > > +        *   code is not going offline.
> > > > > >          */
> > > > > > -       return policy->dvfs_possible_from_any_cpu ||
> > > > > > -               cpumask_test_cpu(smp_processor_id(),
> policy->cpus);
> > > > > > +       return cpumask_test_cpu(smp_processor_id(),
> policy->cpus) ||
> > > > > > +               (policy->dvfs_possible_from_any_cpu &&
> > > > > > +                !cpumask_test_cpu(smp_processor_id(),
> policy-
> > > > > > >related_cpus));
> > > > > >  }
> > > > >
> > > > > I will start a stress test of this patch, thanks!
> > > >
> > > > OK, thanks!
> > > >
> > > > Another patch to test is appended and it should be more robust.
> > > >
> > > > Instead of doing the related_cpus cpumask check in the previous
> > > > patch (which only covered CPUs that belog to the target policy) it
> > > > checks if the update_util hook is set for the local CPU (if it is
> > > > not, that CPU is not expected to run the uodate_util code).
> > >
> > > One more thing.
> > >
> > > Both of the previous patches would not fix the schedutil governor in
> > > which
> > > cpufreq_this_cpu_can_update() only is called in the fast_switch case
> > > and that is not when irq_works are used.
> > >
> > > So please discard the patch I have just posted and here is an
> > > updated patch that covers schedutil too, so please test this one instead.
> > >
> > > ---
> > >  include/linux/cpufreq.h          |   11 -----------
> > >  include/linux/sched/cpufreq.h    |    3 +++
> > >  kernel/sched/cpufreq.c           |   18 ++++++++++++++++++
> > >  kernel/sched/cpufreq_schedutil.c |    8 +++-----
> > >  4 files changed, 24 insertions(+), 16 deletions(-)
> > >
> > > Index: linux-pm/include/linux/cpufreq.h
> > >
> ==============================================================
> > > =====
> > > --- linux-pm.orig/include/linux/cpufreq.h
> > > +++ linux-pm/include/linux/cpufreq.h
> > > @@ -595,17 +595,6 @@ struct governor_attr {
> > >                        size_t count);  };
> > >
> > > -static inline bool cpufreq_this_cpu_can_update(struct
> > > cpufreq_policy *policy) -{
> > > -     /*
> > > -      * Allow remote callbacks if:
> > > -      * - dvfs_possible_from_any_cpu flag is set
> > > -      * - the local and remote CPUs share cpufreq policy
> > > -      */
> > > -     return policy->dvfs_possible_from_any_cpu ||
> > > -             cpumask_test_cpu(smp_processor_id(), policy->cpus);
> > > -}
> > > -
> > >
> > >
> /*************************************************************
> > > ********
> > >   *                     FREQUENCY TABLE HELPERS
> > > *
> > >
> > >
> **************************************************************
> > > *******/
> > > Index: linux-pm/kernel/sched/cpufreq.c
> > >
> ==============================================================
> > > =====
> > > --- linux-pm.orig/kernel/sched/cpufreq.c
> > > +++ linux-pm/kernel/sched/cpufreq.c
> > > @@ -5,6 +5,8 @@
> > >   * Copyright (C) 2016, Intel Corporation
> > >   * Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >   */
> > > +#include <linux/cpufreq.h>
> > > +
> > >  #include "sched.h"
> > >
> > >  DEFINE_PER_CPU(struct update_util_data __rcu *,
> > > cpufreq_update_util_data); @@ -57,3 +59,19 @@ void
> > > cpufreq_remove_update_util_hook(int
> > >       rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu),
> > > NULL);  } EXPORT_SYMBOL_GPL(cpufreq_remove_update_util_hook);
> > > +
> > > +/**
> > > + * cpufreq_this_cpu_can_update - Check if cpufreq policy can be
> updated.
> > > + * @policy: cpufreq policy to check.
> > > + *
> > > + * Return 'true' if:
> > > + * - the local and remote CPUs share @policy,
> > > + * - dvfs_possible_from_any_cpu is set in @policy and the local CPU
> > > +is not
> > > going
> > > + *   offline (in which it is not expected to run cpufreq updates any
> more).
> > > + */
> > > +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy) {
> > > +     return cpumask_test_cpu(smp_processor_id(), policy->cpus) ||
> > > +             (policy->dvfs_possible_from_any_cpu &&
> > > +
> > > rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data)));
> > > +}
> > > Index: linux-pm/include/linux/sched/cpufreq.h
> > >
> ==============================================================
> > > =====
> > > --- linux-pm.orig/include/linux/sched/cpufreq.h
> > > +++ linux-pm/include/linux/sched/cpufreq.h
> > > @@ -12,6 +12,8 @@
> > >  #define SCHED_CPUFREQ_MIGRATION      (1U << 1)
> > >
> > >  #ifdef CONFIG_CPU_FREQ
> > > +struct cpufreq_policy;
> > > +
> > >  struct update_util_data {
> > >         void (*func)(struct update_util_data *data, u64 time,
> > > unsigned int flags);  }; @@ -20,6 +22,7 @@ void
> cpufreq_add_update_util_hook(int cp
> > >                         void (*func)(struct update_util_data *data,
> > > u64 time,
> > >                                   unsigned int flags));  void
> > > cpufreq_remove_update_util_hook(int cpu);
> > > +bool cpufreq_this_cpu_can_update(struct cpufreq_policy *policy);
> > >
> > >  static inline unsigned long map_util_freq(unsigned long util,
> > >                                       unsigned long freq,
> unsigned
> > > long cap)
> > > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > >
> ==============================================================
> > > =====
> > > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > > @@ -82,12 +82,10 @@ static bool sugov_should_update_freq(str
> > >        * by the hardware, as calculating the frequency is pointless if
> > >        * we cannot in fact act on it.
> > >        *
> > > -      * For the slow switching platforms, the kthread is always
> scheduled on
> > > -      * the right set of CPUs and any CPU can find the next frequency
> and
> > > -      * schedule the kthread.
> > > +      * This is needed on the slow switching platforms too to prevent
> CPUs
> > > +      * going offline from leaving stale IRQ work items behind.
> > >        */
> > > -     if (sg_policy->policy->fast_switch_enabled &&
> > > -         !cpufreq_this_cpu_can_update(sg_policy->policy))
> > > +     if (!cpufreq_this_cpu_can_update(sg_policy->policy))
> > >               return false;
> > >
> > >       if (unlikely(sg_policy->limits_changed)) {
> >
> > So we will not queue irq_work on the offlining CPU in your patch.
> >
> > When we met issue on CPU3, it is CPU3 has irq_work pending, but the
> > SGI IRQ_WORK interrupt is not handled because irq is always disabled,
> > see stack in idle irq disabled state.
> >
> > [  227.344678] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
> > 5.4.0-03554-gbb1159fa5556-dirty #95 [  227.344682] Hardware name:
> > Freescale i.MX8QXP MEK (DT) [  227.344686] Call trace:
> > [  227.344701]  dump_backtrace+0x0/0x140 [  227.344708]
> > show_stack+0x14/0x20 [  227.344717]  dump_stack+0xb4/0xf8 [
> > 227.344730]  dbs_update_util_handler+0x150/0x180
> > [  227.344739]  update_load_avg+0x38c/0x3c8 [  227.344746]
> > enqueue_task_fair+0xcc/0x3a0 [  227.344756]  activate_task+0x5c/0xa0
> [
> > 227.344766]  ttwu_do_activate.isra.0+0x4c/0x70 [  227.344776]
> > try_to_wake_up+0x2d8/0x410 [  227.344786]
> wake_up_process+0x14/0x20 [
> > 227.344794]  swake_up_locked.part.0+0x18/0x38 [  227.344801]
> > swake_up_one+0x30/0x48 [  227.344808]
> rcu_gp_kthread_wake+0x5c/0x80 [
> > 227.344815]  rcu_report_qs_rsp+0x40/0x50 [  227.344825]
> > rcu_report_qs_rnp+0x120/0x148 [  227.344832]
> > rcu_report_dead+0x120/0x130 [  227.344841]
> > cpuhp_report_idle_dead+0x3c/0x80 [  227.344847]
> do_idle+0x198/0x280 [
> > 227.344856]  cpu_startup_entry+0x24/0x40 [  227.344865]
> > secondary_start_kernel+0x154/0x190
> > [  227.344905] CPU3: shutdown
> > [  227.444015] psci: CPU3 killed.
> >
> >
> > I also met CPU1 offlining have irq_work queued, but CPU1 not trigger
> > issue, because SGI IRQ_WORK interrupt is handled.
> >
> > There are multiple path to run into dbs_update_util_handler
> > irq_work_queue, the path might will enable interrupt, might not.
> >
> > Seem do_idle is the only path that will trigger cpu_die for HOTPLUG
> > ARM/ARM64.
> >
> > So do we need to use idle as flag to queue irq_work or not?
> > In this way, we could still inject irq work on offlining/offline cpu,
> > until it runs into idle to cpu_die.
> 
> To be honest, I'm not sure what you mean.

Sorry to be not clear. I did a trivial patch to verify the issue only happen
when the code runs in idle process context.

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 4bb054d0cb43..85e78da7fa2e 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -15,6 +15,7 @@

 #include <linux/export.h>
 #include <linux/kernel_stat.h>
+#include <linux/sched.h>
 #include <linux/slab.h>

 #include "cpufreq_governor.h"
@@ -316,6 +317,10 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,

        policy_dbs->last_sample_time = time;
        policy_dbs->work_in_progress = true;
+#ifdef CONFIG_HOTPLUG_CPU
+       if (is_idle_task(current))
+               return;
+#endif
        irq_work_queue(&policy_dbs->irq_work);
 }

It passed 5000+ times cpu online/offline test.

> 
> In cpufreq we cannot just avoid queuing up an IRQ work, because we've
> already made changes in preparation for it to run.  We need to decide
> whether or not to carry out the entire utilization update upfront.
> 
> But preventing CPUs with NULL cpufreq_update_util_data pointers from
> running cpufreq utilization update code at all (like in the last
> patch) should be sufficient to address this problem entirely.  At least I don't
> see why not.

I just think the scheduler want to inject irq_work on the cpu even during the
cpu offlining process, but we not inject irq work with your patch. Is this right?

Thanks,
Peng.

> 
> Thanks!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-11  9:43                                                               ` Peng Fan
@ 2019-12-11  9:52                                                                 ` Rafael J. Wysocki
  2019-12-11 10:11                                                                   ` Peng Fan
  0 siblings, 1 reply; 57+ messages in thread
From: Rafael J. Wysocki @ 2019-12-11  9:52 UTC (permalink / raw)
  To: Peng Fan
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Anson Huang, Viresh Kumar,
	Jacky Bai, linux-pm, Vincent Guittot, Peter Zijlstra,
	Paul McKenney

On Wed, Dec 11, 2019 at 10:43 AM Peng Fan <peng.fan@nxp.com> wrote:
>
> > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> >
> > On Wed, Dec 11, 2019 at 9:59 AM Peng Fan <peng.fan@nxp.com> wrote:

[cut]

> > But preventing CPUs with NULL cpufreq_update_util_data pointers from
> > running cpufreq utilization update code at all (like in the last
> > patch) should be sufficient to address this problem entirely.  At least I don't
> > see why not.
>
> I just think the scheduler want to inject irq_work on the cpu even during the
> cpu offlining process, but we not inject irq work with your patch. Is this right?

Yes, but that means that we just avoid a cross-update which would be
discarded due to the policy->cpus mask check without
dvfs_possible_from_any_cpu.

The target online CPU will run this update by itself eventually anyway.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: About CPU hot-plug stress test failed in cpufreq driver
  2019-12-11  9:52                                                                 ` Rafael J. Wysocki
@ 2019-12-11 10:11                                                                   ` Peng Fan
  0 siblings, 0 replies; 57+ messages in thread
From: Peng Fan @ 2019-12-11 10:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Anson Huang, Viresh Kumar, Jacky Bai,
	linux-pm, Vincent Guittot, Peter Zijlstra, Paul McKenney

> Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> 
> On Wed, Dec 11, 2019 at 10:43 AM Peng Fan <peng.fan@nxp.com> wrote:
> >
> > > Subject: Re: About CPU hot-plug stress test failed in cpufreq driver
> > >
> > > On Wed, Dec 11, 2019 at 9:59 AM Peng Fan <peng.fan@nxp.com> wrote:
> 
> [cut]
> 
> > > But preventing CPUs with NULL cpufreq_update_util_data pointers from
> > > running cpufreq utilization update code at all (like in the last
> > > patch) should be sufficient to address this problem entirely.  At
> > > least I don't see why not.
> >
> > I just think the scheduler want to inject irq_work on the cpu even
> > during the cpu offlining process, but we not inject irq work with your patch.
> Is this right?
> 
> Yes, but that means that we just avoid a cross-update which would be
> discarded due to the policy->cpus mask check without
> dvfs_possible_from_any_cpu.

Understand.

> 
> The target online CPU will run this update by itself eventually anyway.

Yes. Thanks for explanation.

Thanks,
Peng.

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, back to index

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <DB3PR0402MB391626A8ECFDC182C6EDCF8DF54E0@DB3PR0402MB3916.eurprd04.prod.outlook.com>
2019-11-21  9:35 ` About CPU hot-plug stress test failed in cpufreq driver Viresh Kumar
2019-11-21 10:13   ` Anson Huang
2019-11-21 10:53     ` Rafael J. Wysocki
2019-11-21 10:56       ` Rafael J. Wysocki
2019-11-22  5:15         ` Anson Huang
2019-11-22  9:59           ` Rafael J. Wysocki
2019-11-25  6:05             ` Anson Huang
2019-11-25  9:43               ` Anson Huang
2019-11-26  6:18                 ` Viresh Kumar
2019-11-26  8:22                   ` Anson Huang
2019-11-26  8:25                     ` Viresh Kumar
2019-11-25 12:44               ` Rafael J. Wysocki
2019-11-26  8:57                 ` Rafael J. Wysocki
2019-11-29 11:39                 ` Rafael J. Wysocki
2019-11-29 13:44                   ` Anson Huang
2019-12-05  8:53                     ` Anson Huang
2019-12-05 10:48                       ` Rafael J. Wysocki
2019-12-05 13:18                         ` Anson Huang
2019-12-05 15:52                           ` Rafael J. Wysocki
2019-12-09 10:31                             ` Peng Fan
2019-12-09 10:37                             ` Anson Huang
2019-12-09 10:56                               ` Anson Huang
2019-12-09 11:23                                 ` Rafael J. Wysocki
2019-12-09 12:32                                   ` Anson Huang
2019-12-09 12:44                                     ` Rafael J. Wysocki
2019-12-09 14:18                                       ` Anson Huang
2019-12-10  5:39                                         ` Anson Huang
2019-12-10  5:53                                       ` Peng Fan
2019-12-10  7:05                                         ` Viresh Kumar
2019-12-10  8:22                                           ` Rafael J. Wysocki
2019-12-10  8:29                                             ` Anson Huang
2019-12-10  8:36                                               ` Viresh Kumar
2019-12-10  8:37                                                 ` Peng Fan
2019-12-10  8:37                                               ` Rafael J. Wysocki
2019-12-10  8:43                                                 ` Peng Fan
2019-12-10  8:45                                                 ` Anson Huang
2019-12-10  8:50                                                   ` Rafael J. Wysocki
2019-12-10  8:51                                                     ` Anson Huang
2019-12-10 10:39                                                       ` Rafael J. Wysocki
2019-12-10 10:54                                                         ` Rafael J. Wysocki
2019-12-11  5:08                                                           ` Anson Huang
2019-12-11  8:59                                                           ` Peng Fan
2019-12-11  9:36                                                             ` Rafael J. Wysocki
2019-12-11  9:43                                                               ` Peng Fan
2019-12-11  9:52                                                                 ` Rafael J. Wysocki
2019-12-11 10:11                                                                   ` Peng Fan
2019-12-10 10:54                                                         ` Viresh Kumar
2019-12-10 11:07                                                           ` Rafael J. Wysocki
2019-12-10  8:57                                                     ` Viresh Kumar
2019-12-10 11:03                                                       ` Rafael J. Wysocki
2019-12-10  9:04                                                     ` Rafael J. Wysocki
2019-12-10  8:31                                             ` Viresh Kumar
2019-12-10  8:12                                         ` Rafael J. Wysocki
2019-12-05 11:00                       ` Viresh Kumar
2019-12-05 11:10                         ` Rafael J. Wysocki
2019-12-05 11:17                           ` Viresh Kumar
2019-11-21 10:37   ` Rafael J. Wysocki

Linux-PM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pm/0 linux-pm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pm linux-pm/ https://lore.kernel.org/linux-pm \
		linux-pm@vger.kernel.org
	public-inbox-index linux-pm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git