From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36DF0C433E1 for ; Thu, 2 Jul 2020 17:50:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1ABF32088E for ; Thu, 2 Jul 2020 17:50:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="Su8z7mc7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727996AbgGBRuC (ORCPT ); Thu, 2 Jul 2020 13:50:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727976AbgGBRt5 (ORCPT ); Thu, 2 Jul 2020 13:49:57 -0400 Received: from mail-wm1-x341.google.com (mail-wm1-x341.google.com [IPv6:2a00:1450:4864:20::341]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6106C08C5DF for ; Thu, 2 Jul 2020 10:49:56 -0700 (PDT) Received: by mail-wm1-x341.google.com with SMTP id 22so28067077wmg.1 for ; Thu, 02 Jul 2020 10:49:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=xQ/jbEIoi0E5v/8Oaxv5C/iCmVriuxnyUPKYmwbLhJI=; b=Su8z7mc7dYnPY7oZG6Qfy2JYbC+KUSrWJlUQisUIJi7j2laFEpLi08Hzg9aJEscsNx CNgySZxbqQ8zHjlIkbChmzaDwLiFGQwPsO3X5dNSPEv/D5J2XxENVkxE3rCu+0kw3Es7 gAu0S3jeVQa36FEZB8+0pnpQLob/I5kajgBvYbdEAUC2zXC03tOVuW0syjSnQ5dcFOLr ZhU6sAuB6QYQCHmMi/h06UeD2lfNsx6eXpsJmeVe2jge71IivDIZvo67ktv2TyZILyZG EmVXGKoi77uzrMMN0wg4385tboOrgBWpkgPF2IGUOOhSAlQOZN0/WykOwi01hI1ZnfMr vsSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=xQ/jbEIoi0E5v/8Oaxv5C/iCmVriuxnyUPKYmwbLhJI=; b=T4ND8zCjXBHSmIGlRhyojlCqq+EAdG7gdjHO2fd8xeWp3VQMRDWQDvQeWmzAvVrWTz X99BkSz/IX/wKwPwVj1oo304RHv2nX6KzPGMZ5rCB6JlVpiDNLcDpm4kud2UhOlh06zv 5bDIgV94WJdn81y+fWmUC7GOHufy4o6Jr0SjtNqCewrRQpa5KiBSyCSHlYaPHbEVmfcg cqFo8OkXjfc1WwM94amYKtloEL0u9KY5DMdTAbw+IqmnpSitsj0PEXfeNhoCe78KL/0s ys9jihDsTMyTa+FVb5tNizRyB2m1RVdgxq6fw71/t/tJ69pcyMC7N8RaQRBg8f96gFoO C25g== X-Gm-Message-State: AOAM530DfO1A7PCaWEO6+g6YqE4s1kBtnkvOTka8N6pRZwrO7Jm1NNHi KuN8StfOxLgBZnTTxTVp6OnXMQ== X-Google-Smtp-Source: ABdhPJyzRv8aoQm207JFCd7UUYqXvDOaDeuaWNHVTkS4tsgnm3DlNfQKfqswQjF2MPlESIwAbwvFXw== X-Received: by 2002:a1c:f203:: with SMTP id s3mr31694303wmc.126.1593712194925; Thu, 02 Jul 2020 10:49:54 -0700 (PDT) Received: from ?IPv6:2a01:e34:ed2f:f020:c88a:7b2b:a4a1:46d0? ([2a01:e34:ed2f:f020:c88a:7b2b:a4a1:46d0]) by smtp.googlemail.com with ESMTPSA id b18sm2353454wrs.46.2020.07.02.10.49.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 02 Jul 2020 10:49:54 -0700 (PDT) Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices To: Andrzej Pietrasiewicz , linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org, netdev@vger.kernel.org, linux-wireless@vger.kernel.org, platform-driver-x86@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-renesas-soc@vger.kernel.org, linux-rockchip@lists.infradead.org Cc: "Rafael J . Wysocki" , Len Brown , Vishal Kulkarni , "David S . Miller" , Jiri Pirko , Ido Schimmel , Johannes Berg , Emmanuel Grumbach , Luca Coelho , Intel Linux Wireless , Kalle Valo , Peter Kaestle , Darren Hart , Andy Shevchenko , Sebastian Reichel , Miquel Raynal , Amit Kucheria , Support Opensource , Shawn Guo , Sascha Hauer , Pengutronix Kernel Team , Fabio Estevam , NXP Linux Team , =?UTF-8?Q?Niklas_S=c3=b6derlund?= , Heiko Stuebner , Orson Zhai , Baolin Wang , Chunyan Zhang , Zhang Rui , Allison Randal , Enrico Weigelt , Gayatri Kammela , Thomas Gleixner , Bartlomiej Zolnierkiewicz , kernel@collabora.com References: <20200629122925.21729-1-andrzej.p@collabora.com> <3d03d1a2-ac06-b69b-93cb-e0203be62c10@collabora.com> <47111821-d691-e71d-d740-e4325e290fa4@linaro.org> <4353a939-3f5e-8369-5bc0-ad8162b5ffc7@linaro.org> <73942aea-ae79-753c-fe90-d4a99423d548@collabora.com> <374dddd9-b600-3a30-d6c3-8cfcefc944d9@linaro.org> <5a28deb7-f307-8b03-faad-ab05cb8095d1@collabora.com> <8aeb4f51-1813-63c1-165b-06640af5968f@linaro.org> <685ef627-e377-bbf1-da11-7f7556ca2dd7@collabora.com> From: Daniel Lezcano Message-ID: Date: Thu, 2 Jul 2020 19:49:51 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <685ef627-e377-bbf1-da11-7f7556ca2dd7@collabora.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote: > Hi, > > W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze: >> On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote: >>> Hi Daniel, >>> >>> >>> >>>>>>>> >>>>>>>> I did reproduce: >>>>>>>> >>>>>>>> v5.8-rc3 + series => imx6 hang at boot time >>>>>>>> v5.8-rc3 => imx6 boots correctly >>>> >>>> So finally I succeeded to reproduce it on my imx7 locally. The sensor >>>> was failing to initialize for another reason related to the legacy >>>> cooling device, this is why it is not appearing on the imx7. >>>> >>>> I can now git-bisect :) >>>> >>> >>> That would be very kind of you, thank you! >> >> With the lock correctness option enabled: >> >> [    4.179223] imx_thermal tempmon: Extended Commercial CPU temperature >> grade - max:105C critical:100C passive:95C >> [    4.189557] >> [    4.191060] ============================================ >> [    4.196378] WARNING: possible recursive locking detected >> [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted >> [    4.207102] -------------------------------------------- >> [    4.212421] kworker/0:3/54 is trying to acquire lock: >> [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: >> thermal_zone_device_is_enabled+0x18/0x34 >> [    4.225777] >> [    4.225777] but task is already holding lock: >> [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: >> thermal_zone_get_temp+0x38/0x6c >> [    4.239121] >> [    4.239121] other info that might help us debug this: >> [    4.245655]  Possible unsafe locking scenario: >> [    4.245655] >> [    4.251579]        CPU0 >> [    4.254031]        ---- >> [    4.256481]   lock(&tz->lock); >> [    4.259544]   lock(&tz->lock); >> [    4.262608] >> [    4.262608]  *** DEADLOCK *** >> [    4.262608] >> [    4.268533]  May be due to missing lock nesting notation >> [    4.268533] >> [    4.275329] 4 locks held by kworker/0:3/54: >> [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at: >> process_one_work+0x224/0x808 >> [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at: >> process_one_work+0x224/0x808 >> [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at: >> __device_attach+0x30/0x140 >> [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at: >> thermal_zone_get_temp+0x38/0x6c >> [    4.312408] >> [    4.312408] stack backtrace: >> [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted >> 5.8.0-rc3-00011-gf5e50bf4d3ef #42 >> [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree) >> [    4.330809] Workqueue: events deferred_probe_work_func >> [    4.335973] [] (unwind_backtrace) from [] >> (show_stack+0x10/0x14) >> [    4.343734] [] (show_stack) from [] >> (dump_stack+0xe8/0x114) >> [    4.351062] [] (dump_stack) from [] >> (__lock_acquire+0xbfc/0x2cb4) >> [    4.358909] [] (__lock_acquire) from [] >> (lock_acquire+0xf4/0x4e4) >> [    4.366758] [] (lock_acquire) from [] >> (__mutex_lock+0xb0/0xaa8) >> [    4.374431] [] (__mutex_lock) from [] >> (mutex_lock_nested+0x1c/0x24) >> [    4.382452] [] (mutex_lock_nested) from [] >> (thermal_zone_device_is_enabled+0x18/0x34) >> [    4.392036] [] (thermal_zone_device_is_enabled) from >> [] (imx_get_temp+0x30/0x208) >> [    4.401271] [] (imx_get_temp) from [] >> (thermal_zone_get_temp+0x4c/0x6c) >> [    4.409640] [] (thermal_zone_get_temp) from [] >> (thermal_zone_device_update+0x8c/0x258) >> [    4.419310] [] (thermal_zone_device_update) from >> [] (thermal_zone_device_set_mode+0x60/0x88) >> [    4.429500] [] (thermal_zone_device_set_mode) from >> [] (imx_thermal_probe+0x3e4/0x578) >> [    4.439082] [] (imx_thermal_probe) from [] >> (platform_drv_probe+0x48/0x98) >> [    4.447622] [] (platform_drv_probe) from [] >> (really_probe+0x218/0x348) >> [    4.455903] [] (really_probe) from [] >> (driver_probe_device+0x5c/0xb4) >> [    4.464098] [] (driver_probe_device) from [] >> (bus_for_each_drv+0x58/0xb8) >> [    4.472638] [] (bus_for_each_drv) from [] >> (__device_attach+0xd4/0x140) >> [    4.480919] [] (__device_attach) from [] >> (bus_probe_device+0x88/0x90) >> [    4.489112] [] (bus_probe_device) from [] >> (deferred_probe_work_func+0x68/0x98) >> [    4.498088] [] (deferred_probe_work_func) from [] >> (process_one_work+0x2e0/0x808) >> [    4.507237] [] (process_one_work) from [] >> (worker_thread+0x2a0/0x59c) >> [    4.515432] [] (worker_thread) from [] >> (kthread+0x16c/0x178) >> [    4.522843] [] (kthread) from [] >> (ret_from_fork+0x14/0x20) >> [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8) >> [    4.535138] 5fa0:                                     00000000 >> 00000000 00000000 00000000 >> [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000 >> 00000000 00000000 00000000 >> [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013 >> 00000000 >> >> >> > > Thanks! > > That confirms your suspicions. > > So the reason is that ->get_temp() is called while the mutex is held and > thermal_zone_device_is_enabled() wants to take the same mutex. Yes, that's correct. > Is adding a comment to thermal_zone_device_is_enabled() to never call > it while the mutex is held and adding another version of it which does > not take the mutex ok? The thermal_zone_device_is_enabled() is only used in two places, acpi and this imx driver, and given: 1. as soon as the mutex is released, there is no guarantee the thermal zone won't be changed right after, the lock is pointless, thus the information also. 2. from a design point of view, I don't see why a driver should know if a thermal zone is disabled or not It would make sense to end with this function and do not give the different drivers an opportunity to access this information. Why not add change_mode for the acpi in order to enable or disable the events and for imx_thermal use irq_enabled flag instead of the thermal zone mode? Moreover it is very unclear why this function is needed in imx_get_temp(), and I suspect we should be able to get rid of it. -- Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog