[RESEND] On Tue, Jul 12, 2022 at 5:34 AM Steven Rostedt wrote: > > On Mon, 11 Jul 2022 10:58:28 -1000 > Tejun Heo wrote: > > > I don't think lockdep would be able to track CPU1 -> CPU2 dependency here > > unfortunately. > > > > > AFAIU: > > > > > > > > > CPU0 CPU1 CPU2 > > > > > > // attach task to a different > > > // cpuset cgroup via sysfs > > > __acquire(cgroup_threadgroup_rwsem) > > > > > > // pring up CPU2 online > > > __acquire(cpu_hotplug_lock) > > > // wait for CPU2 to come online > > Should there be some annotation here that tells lockdep that CPU1 is now > blocked on CPU2? > > Then this case would be caught by lockdep. > > -- Steve > > > > > // bringup cpu online > > > // call cpufreq_online() which tries to create sugov kthread > > > __acquire(cpu_hotplug_lock) copy_process() > > > cgroup_can_fork() > > > cgroup_css_set_fork() > > > __acquire(cgroup_threadgroup_rwsem) > > > // blocks forever // blocks forever // blocks forever > > > Indeed, It's caused by threads instead of cpus. Our soc contains two cpufreq policy.0-5 belongs to policy0, 6-7 belongs to policy1. when cpu6/7 online Thread-A Thread-B cgroup_file_write device_online cgroup1_tasks_write ... __cgroup1_procs_write _cpu_up write(&cgroup_threadgroup_rwsem); << cpus_write_lock();<< cgroup_attach_task ...... cgroup_migrate_execute cpuhp_kick_ap cpuset_attach //wakeup cpuhp cpus_read_lock() //waitingfor cpuhp cpuhp/6 kthreadd cpuhp_thread_fun cpuhp_invoke_callback cpuhp_cpufreq_online cpufreq_online sugov_init __kthread_create_on_node copy_process //blocked, waiting for kthreadd cgroup_can_fork cgroup_css_set_fork __acquires(&cgroup_threadgroup_rwsem) //blocked So it's logic is: Thread-A --->Thread-B---->cpuhp--->kthreadd---->Thread-A When cpu offline, the sugov thread would stop, so it would waiting cgroup_threadgroup_rwsem when kthread_stop(); It's logic is: Thread-A --->Thread-B---->cpuhp--->sugov---->Thread-A As Qais said: > if there's anything else that creates a kthread when a cpu goes online/offline > then we'll hit the same problem again. Indeed, only the cpuhp thread create/destroy kthread would cause the case. I have put the test script in the mail, and I have tested it without monkey test, the deadlock still occurs.. Thanks! xuewen.yan