From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752006AbcA0DKL (ORCPT ); Tue, 26 Jan 2016 22:10:11 -0500 Received: from mail-pf0-f170.google.com ([209.85.192.170]:35257 "EHLO mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751333AbcA0DKH (ORCPT ); Tue, 26 Jan 2016 22:10:07 -0500 Date: Wed, 27 Jan 2016 08:40:03 +0530 From: Viresh Kumar To: Juri Lelli Cc: Rafael Wysocki , linaro-kernel@lists.linaro.org, linux-pm@vger.kernel.org, "# v4 . 2+" , open list Subject: Re: [PATCH] cpufreq: Fix NULL reference crash while accessing policy->governor_data Message-ID: <20160127031003.GH3322@vireshk> References: <1297c8fc8135f8b5359f9c49d220a939c0ee640e.1453741314.git.viresh.kumar@linaro.org> <20160126095751.GJ10898@e106622-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160126095751.GJ10898@e106622-lin> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26-01-16, 09:57, Juri Lelli wrote: > This patch fixes the crash I was seeing. > > Tested-by: Juri Lelli Thanks. > However, it exposes another problem (running the concurrent lockdep test It exposes? How can this patch expose the below crash. AFAIR, you reported that you are getting below crash on plain mainline on TC2, i.e. for drivers with policy-per-governor set. The reason is obvious, as the governor's sysfs directory is present cpus/cpuX/cpufreq/ instead of cpus/cpufreq/, which used to be the case without the flag. And this forces the show()/store() present in cpufreq.c to be called which also take policy->rwsem. > that you merged in your tests). After the test is finished there is > always at least one task spinning. Do you think it might be related to > the race we are already discussing in the thread related to my cleanups > patches? This is what I see: So this is what you reported earlier, right? > [ 38.843648] other info that might help us debug this: > [ 38.843648] > [ 38.867627] Chain exists of: > s_active#41 --> &policy->rwsem --> od_dbs_cdata.mutex > > [ 38.891693] Possible unsafe locking scenario: > [ 38.891693] Will elaborate it a bit here.. - CPU0 is calling governor's EXIT() - CPU1 is reading a governor file from sysfs > [ 38.909419] CPU0 CPU1 > [ 38.922978] ---- ---- Following needs to be added here.. EXIT-governor read/write governor file lock(s_active#41); > [ 38.936535] lock(od_dbs_cdata.mutex); > [ 38.948146] lock(&policy->rwsem); > [ 38.966168] lock(od_dbs_cdata.mutex); > [ 38.985219] lock(s_active#41); > [ 38.994923] > [ 38.994923] *** DEADLOCK *** > Now, you already pointed me at a possible fix. I'm going to test that > (even if I have questions about that patch :)) and see if it makes this > go away. @Rafael: Juri is talking about this patch: http://www.linux-arm.org/git?p=linux-jl.git;a=commit;h=d3eb02ed23732de2c8671377316a190c38b8fe93 Juri, I thought it will fix it earlier (when I wrote it), but it never did on x86 (while I dropped the rwsem-drop-code around EXIT as well). And I never came back to it and so never sent it upstream. -- viresh