From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0FC8C433E0 for ; Thu, 2 Jul 2020 17:19:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A76E5212CC for ; Thu, 2 Jul 2020 17:19:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726897AbgGBRTl (ORCPT ); Thu, 2 Jul 2020 13:19:41 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:33012 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726754AbgGBRTl (ORCPT ); Thu, 2 Jul 2020 13:19:41 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: andrzej.p) with ESMTPSA id C52382A5EDE From: Andrzej Pietrasiewicz Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices To: Daniel Lezcano , linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org, netdev@vger.kernel.org, linux-wireless@vger.kernel.org, platform-driver-x86@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-renesas-soc@vger.kernel.org, linux-rockchip@lists.infradead.org Cc: "Rafael J . Wysocki" , Len Brown , Vishal Kulkarni , "David S . Miller" , Jiri Pirko , Ido Schimmel , Johannes Berg , Emmanuel Grumbach , Luca Coelho , Intel Linux Wireless , Kalle Valo , Peter Kaestle , Darren Hart , Andy Shevchenko , Sebastian Reichel , Miquel Raynal , Amit Kucheria , Support Opensource , Shawn Guo , Sascha Hauer , Pengutronix Kernel Team , Fabio Estevam , NXP Linux Team , =?UTF-8?Q?Niklas_S=c3=b6derlund?= , Heiko Stuebner , Orson Zhai , Baolin Wang , Chunyan Zhang , Zhang Rui , Allison Randal , Enrico Weigelt , Gayatri Kammela , Thomas Gleixner , Bartlomiej Zolnierkiewicz , kernel@collabora.com References: <20200629122925.21729-1-andrzej.p@collabora.com> <3d03d1a2-ac06-b69b-93cb-e0203be62c10@collabora.com> <47111821-d691-e71d-d740-e4325e290fa4@linaro.org> <4353a939-3f5e-8369-5bc0-ad8162b5ffc7@linaro.org> <73942aea-ae79-753c-fe90-d4a99423d548@collabora.com> <374dddd9-b600-3a30-d6c3-8cfcefc944d9@linaro.org> <5a28deb7-f307-8b03-faad-ab05cb8095d1@collabora.com> <8aeb4f51-1813-63c1-165b-06640af5968f@linaro.org> Message-ID: <685ef627-e377-bbf1-da11-7f7556ca2dd7@collabora.com> Date: Thu, 2 Jul 2020 19:19:32 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <8aeb4f51-1813-63c1-165b-06640af5968f@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Hi, W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze: > On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote: >> Hi Daniel, >> >> >> >>>>>>> >>>>>>> I did reproduce: >>>>>>> >>>>>>> v5.8-rc3 + series => imx6 hang at boot time >>>>>>> v5.8-rc3 => imx6 boots correctly >>> >>> So finally I succeeded to reproduce it on my imx7 locally. The sensor >>> was failing to initialize for another reason related to the legacy >>> cooling device, this is why it is not appearing on the imx7. >>> >>> I can now git-bisect :) >>> >> >> That would be very kind of you, thank you! > > With the lock correctness option enabled: > > [ 4.179223] imx_thermal tempmon: Extended Commercial CPU temperature > grade - max:105C critical:100C passive:95C > [ 4.189557] > [ 4.191060] ============================================ > [ 4.196378] WARNING: possible recursive locking detected > [ 4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted > [ 4.207102] -------------------------------------------- > [ 4.212421] kworker/0:3/54 is trying to acquire lock: > [ 4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: > thermal_zone_device_is_enabled+0x18/0x34 > [ 4.225777] > [ 4.225777] but task is already holding lock: > [ 4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: > thermal_zone_get_temp+0x38/0x6c > [ 4.239121] > [ 4.239121] other info that might help us debug this: > [ 4.245655] Possible unsafe locking scenario: > [ 4.245655] > [ 4.251579] CPU0 > [ 4.254031] ---- > [ 4.256481] lock(&tz->lock); > [ 4.259544] lock(&tz->lock); > [ 4.262608] > [ 4.262608] *** DEADLOCK *** > [ 4.262608] > [ 4.268533] May be due to missing lock nesting notation > [ 4.268533] > [ 4.275329] 4 locks held by kworker/0:3/54: > [ 4.279517] #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at: > process_one_work+0x224/0x808 > [ 4.288241] #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at: > process_one_work+0x224/0x808 > [ 4.296787] #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at: > __device_attach+0x30/0x140 > [ 4.304468] #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: > thermal_zone_get_temp+0x38/0x6c > [ 4.312408] > [ 4.312408] stack backtrace: > [ 4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted > 5.8.0-rc3-00011-gf5e50bf4d3ef #42 > [ 4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree) > [ 4.330809] Workqueue: events deferred_probe_work_func > [ 4.335973] [] (unwind_backtrace) from [] > (show_stack+0x10/0x14) > [ 4.343734] [] (show_stack) from [] > (dump_stack+0xe8/0x114) > [ 4.351062] [] (dump_stack) from [] > (__lock_acquire+0xbfc/0x2cb4) > [ 4.358909] [] (__lock_acquire) from [] > (lock_acquire+0xf4/0x4e4) > [ 4.366758] [] (lock_acquire) from [] > (__mutex_lock+0xb0/0xaa8) > [ 4.374431] [] (__mutex_lock) from [] > (mutex_lock_nested+0x1c/0x24) > [ 4.382452] [] (mutex_lock_nested) from [] > (thermal_zone_device_is_enabled+0x18/0x34) > [ 4.392036] [] (thermal_zone_device_is_enabled) from > [] (imx_get_temp+0x30/0x208) > [ 4.401271] [] (imx_get_temp) from [] > (thermal_zone_get_temp+0x4c/0x6c) > [ 4.409640] [] (thermal_zone_get_temp) from [] > (thermal_zone_device_update+0x8c/0x258) > [ 4.419310] [] (thermal_zone_device_update) from > [] (thermal_zone_device_set_mode+0x60/0x88) > [ 4.429500] [] (thermal_zone_device_set_mode) from > [] (imx_thermal_probe+0x3e4/0x578) > [ 4.439082] [] (imx_thermal_probe) from [] > (platform_drv_probe+0x48/0x98) > [ 4.447622] [] (platform_drv_probe) from [] > (really_probe+0x218/0x348) > [ 4.455903] [] (really_probe) from [] > (driver_probe_device+0x5c/0xb4) > [ 4.464098] [] (driver_probe_device) from [] > (bus_for_each_drv+0x58/0xb8) > [ 4.472638] [] (bus_for_each_drv) from [] > (__device_attach+0xd4/0x140) > [ 4.480919] [] (__device_attach) from [] > (bus_probe_device+0x88/0x90) > [ 4.489112] [] (bus_probe_device) from [] > (deferred_probe_work_func+0x68/0x98) > [ 4.498088] [] (deferred_probe_work_func) from [] > (process_one_work+0x2e0/0x808) > [ 4.507237] [] (process_one_work) from [] > (worker_thread+0x2a0/0x59c) > [ 4.515432] [] (worker_thread) from [] > (kthread+0x16c/0x178) > [ 4.522843] [] (kthread) from [] > (ret_from_fork+0x14/0x20) > [ 4.530074] Exception stack(0xca075fb0 to 0xca075ff8) > [ 4.535138] 5fa0: 00000000 > 00000000 00000000 00000000 > [ 4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 > [ 4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > > > Thanks! That confirms your suspicions. So the reason is that ->get_temp() is called while the mutex is held and thermal_zone_device_is_enabled() wants to take the same mutex. Is adding a comment to thermal_zone_device_is_enabled() to never call it while the mutex is held and adding another version of it which does not take the mutex ok? Andrzej