linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org,
	netdev@vger.kernel.org, linux-wireless@vger.kernel.org,
	platform-driver-x86@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-renesas-soc@vger.kernel.org,
	linux-rockchip@lists.infradead.org
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>,
	"Len Brown" <lenb@kernel.org>,
	"Vishal Kulkarni" <vishal@chelsio.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Jiri Pirko" <jiri@mellanox.com>,
	"Ido Schimmel" <idosch@mellanox.com>,
	"Johannes Berg" <johannes.berg@intel.com>,
	"Emmanuel Grumbach" <emmanuel.grumbach@intel.com>,
	"Luca Coelho" <luciano.coelho@intel.com>,
	"Intel Linux Wireless" <linuxwifi@intel.com>,
	"Kalle Valo" <kvalo@codeaurora.org>,
	"Peter Kaestle" <peter@piie.net>,
	"Darren Hart" <dvhart@infradead.org>,
	"Andy Shevchenko" <andy@infradead.org>,
	"Sebastian Reichel" <sre@kernel.org>,
	"Miquel Raynal" <miquel.raynal@bootlin.com>,
	"Amit Kucheria" <amit.kucheria@verdurent.com>,
	"Support Opensource" <support.opensource@diasemi.com>,
	"Shawn Guo" <shawnguo@kernel.org>,
	"Sascha Hauer" <s.hauer@pengutronix.de>,
	"Pengutronix Kernel Team" <kernel@pengutronix.de>,
	"Fabio Estevam" <festevam@gmail.com>,
	"NXP Linux Team" <linux-imx@nxp.com>,
	"Niklas Söderlund" <niklas.soderlund@ragnatech.se>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Orson Zhai" <orsonzhai@gmail.com>,
	"Baolin Wang" <baolin.wang7@gmail.com>,
	"Chunyan Zhang" <zhang.lyra@gmail.com>,
	"Zhang Rui" <rui.zhang@intel.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Enrico Weigelt" <info@metux.net>,
	"Gayatri Kammela" <gayatri.kammela@intel.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Bartlomiej Zolnierkiewicz" <b.zolnierkie@samsung.com>,
	kernel@collabora.com
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices
Date: Thu, 2 Jul 2020 19:49:51 +0200	[thread overview]
Message-ID: <d41bf28f-ee91-6946-2334-f11ec81f96fe@linaro.org> (raw)
In-Reply-To: <685ef627-e377-bbf1-da11-7f7556ca2dd7@collabora.com>

On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote:
> Hi,
> 
> W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
>> On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
>>> Hi Daniel,
>>>
>>> <snip>
>>>
>>>>>>>>
>>>>>>>> I did reproduce:
>>>>>>>>
>>>>>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>>>>>> v5.8-rc3 => imx6 boots correctly
>>>>
>>>> So finally I succeeded to reproduce it on my imx7 locally. The sensor
>>>> was failing to initialize for another reason related to the legacy
>>>> cooling device, this is why it is not appearing on the imx7.
>>>>
>>>> I can now git-bisect :)
>>>>
>>>
>>> That would be very kind of you, thank you!
>>
>> With the lock correctness option enabled:
>>
>> [    4.179223] imx_thermal tempmon: Extended Commercial CPU temperature
>> grade - max:105C critical:100C passive:95C
>> [    4.189557]
>> [    4.191060] ============================================
>> [    4.196378] WARNING: possible recursive locking detected
>> [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
>> [    4.207102] --------------------------------------------
>> [    4.212421] kworker/0:3/54 is trying to acquire lock:
>> [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_device_is_enabled+0x18/0x34
>> [    4.225777]
>> [    4.225777] but task is already holding lock:
>> [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_get_temp+0x38/0x6c
>> [    4.239121]
>> [    4.239121] other info that might help us debug this:
>> [    4.245655]  Possible unsafe locking scenario:
>> [    4.245655]
>> [    4.251579]        CPU0
>> [    4.254031]        ----
>> [    4.256481]   lock(&tz->lock);
>> [    4.259544]   lock(&tz->lock);
>> [    4.262608]
>> [    4.262608]  *** DEADLOCK ***
>> [    4.262608]
>> [    4.268533]  May be due to missing lock nesting notation
>> [    4.268533]
>> [    4.275329] 4 locks held by kworker/0:3/54:
>> [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at:
>> process_one_work+0x224/0x808
>> [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at:
>> process_one_work+0x224/0x808
>> [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
>> __device_attach+0x30/0x140
>> [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_get_temp+0x38/0x6c
>> [    4.312408]
>> [    4.312408] stack backtrace:
>> [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
>> 5.8.0-rc3-00011-gf5e50bf4d3ef #42
>> [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
>> [    4.330809] Workqueue: events deferred_probe_work_func
>> [    4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
>> (show_stack+0x10/0x14)
>> [    4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
>> (dump_stack+0xe8/0x114)
>> [    4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
>> (__lock_acquire+0xbfc/0x2cb4)
>> [    4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
>> (lock_acquire+0xf4/0x4e4)
>> [    4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
>> (__mutex_lock+0xb0/0xaa8)
>> [    4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
>> (mutex_lock_nested+0x1c/0x24)
>> [    4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
>> (thermal_zone_device_is_enabled+0x18/0x34)
>> [    4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
>> [<c0d9da90>] (imx_get_temp+0x30/0x208)
>> [    4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
>> (thermal_zone_get_temp+0x4c/0x6c)
>> [    4.409640] [<c0d97484>] (thermal_zone_get_temp) from [<c0d93df0>]
>> (thermal_zone_device_update+0x8c/0x258)
>> [    4.419310] [<c0d93df0>] (thermal_zone_device_update) from
>> [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
>> [    4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
>> [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
>> [    4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
>> (platform_drv_probe+0x48/0x98)
>> [    4.447622] [<c0a78388>] (platform_drv_probe) from [<c0a7603c>]
>> (really_probe+0x218/0x348)
>> [    4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
>> (driver_probe_device+0x5c/0xb4)
>> [    4.464098] [<c0a76278>] (driver_probe_device) from [<c0a743bc>]
>> (bus_for_each_drv+0x58/0xb8)
>> [    4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
>> (__device_attach+0xd4/0x140)
>> [    4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
>> (bus_probe_device+0x88/0x90)
>> [    4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
>> (deferred_probe_work_func+0x68/0x98)
>> [    4.498088] [<c0a75564>] (deferred_probe_work_func) from [<c0369988>]
>> (process_one_work+0x2e0/0x808)
>> [    4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
>> (worker_thread+0x2a0/0x59c)
>> [    4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
>> (kthread+0x16c/0x178)
>> [    4.522843] [<c0372208>] (kthread) from [<c0300174>]
>> (ret_from_fork+0x14/0x20)
>> [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
>> [    4.535138] 5fa0:                                     00000000
>> 00000000 00000000 00000000
>> [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
>> 00000000 00000000 00000000
>> [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013
>> 00000000
>>
>>
>>
> 
> Thanks!
> 
> That confirms your suspicions.
> 
> So the reason is that ->get_temp() is called while the mutex is held and
> thermal_zone_device_is_enabled() wants to take the same mutex.

Yes, that's correct.

> Is adding a comment to thermal_zone_device_is_enabled() to never call
> it while the mutex is held and adding another version of it which does
> not take the mutex ok?

The thermal_zone_device_is_enabled() is only used in two places, acpi
and this imx driver, and given:

1. as soon as the mutex is released, there is no guarantee the thermal
zone won't be changed right after, the lock is pointless, thus the
information also.

2. from a design point of view, I don't see why a driver should know if
a thermal zone is disabled or not

It would make sense to end with this function and do not give the
different drivers an opportunity to access this information.

Why not add change_mode for the acpi in order to enable or disable the
events and for imx_thermal use irq_enabled flag instead of the thermal
zone mode? Moreover it is very unclear why this function is needed in
imx_get_temp(), and I suspect we should be able to get rid of it.


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

  reply	other threads:[~2020-07-02 17:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-29 12:29 [PATCH v7 00/11] Stop monitoring disabled devices Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 01/11] acpi: thermal: Fix error handling in the register function Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 02/11] thermal: Store thermal mode in a dedicated enum Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 03/11] thermal: Add current mode to thermal zone device Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 04/11] thermal: Store device mode in struct thermal_zone_device Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 05/11] thermal: remove get_mode() operation of drivers Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 06/11] thermal: Add mode helpers Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 07/11] thermal: Use mode helpers in drivers Andrzej Pietrasiewicz
2020-06-29 12:45   ` Amit Kucheria
2020-06-29 14:14   ` Bartlomiej Zolnierkiewicz
2020-06-29 12:29 ` [PATCH v7 08/11] thermal: Explicitly enable non-changing thermal zone devices Andrzej Pietrasiewicz
2020-06-29 14:15   ` Bartlomiej Zolnierkiewicz
2020-06-30  5:07   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 09/11] thermal: core: Stop polling DISABLED thermal devices Andrzej Pietrasiewicz
2020-06-29 12:46   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 10/11] thermal: Simplify or eliminate unnecessary set_mode() methods Andrzej Pietrasiewicz
2020-06-29 12:47   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 11/11] thermal: Rename set_mode() to change_mode() Andrzej Pietrasiewicz
2020-06-29 12:48   ` Amit Kucheria
2020-06-30 12:57 ` [PATCH v7 00/11] Stop monitoring disabled devices Daniel Lezcano
2020-06-30 13:43   ` Andrzej Pietrasiewicz
2020-06-30 14:53     ` Daniel Lezcano
2020-06-30 15:29       ` Andrzej Pietrasiewicz
2020-06-30 15:53         ` Daniel Lezcano
2020-06-30 16:56           ` Andrzej Pietrasiewicz
2020-06-30 18:33             ` Daniel Lezcano
2020-07-01 10:23               ` Andrzej Pietrasiewicz
2020-07-01 14:25                 ` Daniel Lezcano
2020-07-02 13:47                 ` Daniel Lezcano
2020-07-02 13:53                   ` Andrzej Pietrasiewicz
2020-07-02 14:58                     ` Daniel Lezcano
2020-07-02 17:01                     ` Daniel Lezcano
2020-07-02 17:19                       ` Andrzej Pietrasiewicz
2020-07-02 17:49                         ` Daniel Lezcano [this message]
2020-07-02 17:52                           ` Daniel Lezcano
2020-07-03  1:49                           ` Zhang Rui
2020-07-03  6:38                             ` Daniel Lezcano
2020-07-03 10:45                               ` Andrzej Pietrasiewicz
2020-07-03 11:05                                 ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d41bf28f-ee91-6946-2334-f11ec81f96fe@linaro.org \
    --to=daniel.lezcano@linaro.org \
    --cc=allison@lohutok.net \
    --cc=amit.kucheria@verdurent.com \
    --cc=andrzej.p@collabora.com \
    --cc=andy@infradead.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=baolin.wang7@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dvhart@infradead.org \
    --cc=emmanuel.grumbach@intel.com \
    --cc=festevam@gmail.com \
    --cc=gayatri.kammela@intel.com \
    --cc=heiko@sntech.de \
    --cc=idosch@mellanox.com \
    --cc=info@metux.net \
    --cc=jiri@mellanox.com \
    --cc=johannes.berg@intel.com \
    --cc=kernel@collabora.com \
    --cc=kernel@pengutronix.de \
    --cc=kvalo@codeaurora.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linuxwifi@intel.com \
    --cc=luciano.coelho@intel.com \
    --cc=miquel.raynal@bootlin.com \
    --cc=netdev@vger.kernel.org \
    --cc=niklas.soderlund@ragnatech.se \
    --cc=orsonzhai@gmail.com \
    --cc=peter@piie.net \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=rui.zhang@intel.com \
    --cc=s.hauer@pengutronix.de \
    --cc=shawnguo@kernel.org \
    --cc=sre@kernel.org \
    --cc=support.opensource@diasemi.com \
    --cc=tglx@linutronix.de \
    --cc=vishal@chelsio.com \
    --cc=zhang.lyra@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).