From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964828AbcATKRB (ORCPT ); Wed, 20 Jan 2016 05:17:01 -0500 Received: from foss.arm.com ([217.140.101.70]:44286 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934187AbcATKQ4 (ORCPT ); Wed, 20 Jan 2016 05:16:56 -0500 Date: Wed, 20 Jan 2016 10:17:24 +0000 From: Juri Lelli To: Viresh Kumar Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, peterz@infradead.org, rjw@rjwysocki.net, mturquette@baylibre.com, steve.muckle@linaro.org, vincent.guittot@linaro.org, morten.rasmussen@arm.com, dietmar.eggemann@arm.com Subject: Re: [RFC PATCH 15/19] cpufreq: remove useless usage of cpufreq_governor_mutex in __cpufreq_governor Message-ID: <20160120101724.GM8573@e106622-lin> References: <1452533760-13787-1-git-send-email-juri.lelli@arm.com> <1452533760-13787-16-git-send-email-juri.lelli@arm.com> <20160112110658.GG1084@ubuntu> <20160115163031.GU18603@e106622-lin> <20160118055034.GC30762@vireshk> <20160119164941.GI8573@e106622-lin> <20160120072933.GF22443@vireshk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160120072933.GF22443@vireshk> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20/01/16 12:59, Viresh Kumar wrote: > On 19-01-16, 16:49, Juri Lelli wrote: > > I'm actually hitting this running sp2, on linux-pm/linux-next :/. > > That's really bad .. Are you hitting this on Juno or x86 ? > That's on TC2. I'll try to run the same on Juno and x86. > And I am sure you would have hit that with your changes as well, but > now its on the currently merged patches :( > > > ====================================================== > > [ INFO: possible circular locking dependency detected ] > > 4.4.0+ #445 Not tainted > > ------------------------------------------------------- > > trace.sh/1723 is trying to acquire lock: > > (s_active#48){++++.+}, at: [] kernfs_remove_by_name_ns+0x4c/0x94 > > > > but task is already holding lock: > > (od_dbs_cdata.mutex){+.+.+.}, at: [] cpufreq_governor_dbs+0x34/0x5d4 > > > > which lock already depends on the new lock. > > > > > > the existing dependency chain (in reverse order) is: > > > > -> #2 (od_dbs_cdata.mutex){+.+.+.}: > > [] mutex_lock_nested+0x7c/0x434 > > [] cpufreq_governor_dbs+0x34/0x5d4 > > [] return_to_handler+0x0/0x18 > > > > -> #1 (&policy->rwsem){+++++.}: > > [] down_read+0x58/0x94 > > [] show+0x30/0x60 > > [] sysfs_kf_seq_show+0x90/0xfc > > [] kernfs_seq_show+0x34/0x38 > > [] seq_read+0x1e4/0x4e4 > > [] kernfs_fop_read+0x120/0x1a0 > > [] __vfs_read+0x3c/0xe0 > > [] vfs_read+0x98/0x104 > > [] SyS_read+0x50/0x90 > > [] ret_fast_syscall+0x0/0x1c > > > > -> #0 (s_active#48){++++.+}: > > [] lock_acquire+0xd4/0x20c > > [] __kernfs_remove+0x288/0x328 > > [] kernfs_remove_by_name_ns+0x4c/0x94 > > [] remove_files+0x44/0x88 > > [] sysfs_remove_group+0x50/0xa4 > > [] cpufreq_governor_dbs+0x3f0/0x5d4 > > [] return_to_handler+0x0/0x18 > > > > other info that might help us debug this: > > > > Chain exists of: > > s_active#48 --> &policy->rwsem --> od_dbs_cdata.mutex > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(od_dbs_cdata.mutex); > > lock(&policy->rwsem); > > lock(od_dbs_cdata.mutex); > > lock(s_active#48); > > > > *** DEADLOCK *** > > > > 5 locks held by trace.sh/1723: > > #0: (sb_writers#6){.+.+.+}, at: [] __sb_start_write+0xb4/0xc0 > > #1: (&of->mutex){+.+.+.}, at: [] kernfs_fop_write+0x6c/0x1c8 > > #2: (s_active#35){.+.+.+}, at: [] kernfs_fop_write+0x74/0x1c8 > > #3: (cpu_hotplug.lock){++++++}, at: [] get_online_cpus+0x48/0xb8 > > #4: (od_dbs_cdata.mutex){+.+.+.}, at: [] cpufreq_governor_dbs+0x34/0x5d4 > > > > stack backtrace: > > CPU: 2 PID: 1723 Comm: trace.sh Not tainted 4.4.0+ #445 > > Hardware name: ARM-Versatile Express > > [] (unwind_backtrace) from [] (show_stack+0x20/0x24) > > [] (show_stack) from [] (dump_stack+0x80/0xb4) > > [] (dump_stack) from [] (print_circular_bug+0x29c/0x2f0) > > [] (print_circular_bug) from [] (__lock_acquire+0x163c/0x1d74) > > [] (__lock_acquire) from [] (lock_acquire+0xd4/0x20c) > > [] (lock_acquire) from [] (__kernfs_remove+0x288/0x328) > > [] (__kernfs_remove) from [] (kernfs_remove_by_name_ns+0x4c/0x94) > > [] (kernfs_remove_by_name_ns) from [] (remove_files+0x44/0x88) > > [] (remove_files) from [] (sysfs_remove_group+0x50/0xa4) > > [] (sysfs_remove_group) from [] (cpufreq_governor_dbs+0x3f0/0x5d4) > > [] (cpufreq_governor_dbs) from [] (return_to_handler+0x0/0x18) > > > > Now, I couldn't yet make sense of this, but it seems to be > > triggered by setting ondemand, printing its attributes and then > > switching to conservative (that's what sp2 does, right?). Also, s_active > > seems to come into play only when lockdep is enabled. Are you seeing > > this as well? > > There is something about the platform you are running this on.. I > don't hit it most of the times in my exynos board (Dual A15), but x86 > and powerpc guys used to report this all the time. I have tried with > both have-governor-per-policy and otherwise. > > I have explained something similar in the earlier commits I pointed to > you, here is the commit log: > > http://pastebin.com/JbEJBLzU > Yeah, saw that. I guess I have to stare at this thing more. Thanks, - Juri