Linux-ACPI Archive on lore.kernel.org
 help / color / Atom feed
From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org,
	netdev@vger.kernel.org, linux-wireless@vger.kernel.org,
	platform-driver-x86@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-renesas-soc@vger.kernel.org,
	linux-rockchip@lists.infradead.org
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>,
	"Len Brown" <lenb@kernel.org>,
	"Vishal Kulkarni" <vishal@chelsio.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Jiri Pirko" <jiri@mellanox.com>,
	"Ido Schimmel" <idosch@mellanox.com>,
	"Johannes Berg" <johannes.berg@intel.com>,
	"Emmanuel Grumbach" <emmanuel.grumbach@intel.com>,
	"Luca Coelho" <luciano.coelho@intel.com>,
	"Intel Linux Wireless" <linuxwifi@intel.com>,
	"Kalle Valo" <kvalo@codeaurora.org>,
	"Peter Kaestle" <peter@piie.net>,
	"Darren Hart" <dvhart@infradead.org>,
	"Andy Shevchenko" <andy@infradead.org>,
	"Sebastian Reichel" <sre@kernel.org>,
	"Miquel Raynal" <miquel.raynal@bootlin.com>,
	"Amit Kucheria" <amit.kucheria@verdurent.com>,
	"Support Opensource" <support.opensource@diasemi.com>,
	"Shawn Guo" <shawnguo@kernel.org>,
	"Sascha Hauer" <s.hauer@pengutronix.de>,
	"Pengutronix Kernel Team" <kernel@pengutronix.de>,
	"Fabio Estevam" <festevam@gmail.com>,
	"NXP Linux Team" <linux-imx@nxp.com>,
	"Niklas Söderlund" <niklas.soderlund@ragnatech.se>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Orson Zhai" <orsonzhai@gmail.com>,
	"Baolin Wang" <baolin.wang7@gmail.com>,
	"Chunyan Zhang" <zhang.lyra@gmail.com>,
	"Zhang Rui" <rui.zhang@intel.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Enrico Weigelt" <info@metux.net>,
	"Gayatri Kammela" <gayatri.kammela@intel.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Bartlomiej Zolnierkiewicz" <b.zolnierkie@samsung.com>,
	kernel@collabora.com
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices
Date: Thu, 2 Jul 2020 19:49:51 +0200
Message-ID: <d41bf28f-ee91-6946-2334-f11ec81f96fe@linaro.org> (raw)
In-Reply-To: <685ef627-e377-bbf1-da11-7f7556ca2dd7@collabora.com>

On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote:
> Hi,
> 
> W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
>> On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
>>> Hi Daniel,
>>>
>>> <snip>
>>>
>>>>>>>>
>>>>>>>> I did reproduce:
>>>>>>>>
>>>>>>>> v5.8-rc3 + series => imx6 hang at boot time
>>>>>>>> v5.8-rc3 => imx6 boots correctly
>>>>
>>>> So finally I succeeded to reproduce it on my imx7 locally. The sensor
>>>> was failing to initialize for another reason related to the legacy
>>>> cooling device, this is why it is not appearing on the imx7.
>>>>
>>>> I can now git-bisect :)
>>>>
>>>
>>> That would be very kind of you, thank you!
>>
>> With the lock correctness option enabled:
>>
>> [    4.179223] imx_thermal tempmon: Extended Commercial CPU temperature
>> grade - max:105C critical:100C passive:95C
>> [    4.189557]
>> [    4.191060] ============================================
>> [    4.196378] WARNING: possible recursive locking detected
>> [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
>> [    4.207102] --------------------------------------------
>> [    4.212421] kworker/0:3/54 is trying to acquire lock:
>> [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_device_is_enabled+0x18/0x34
>> [    4.225777]
>> [    4.225777] but task is already holding lock:
>> [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_get_temp+0x38/0x6c
>> [    4.239121]
>> [    4.239121] other info that might help us debug this:
>> [    4.245655]  Possible unsafe locking scenario:
>> [    4.245655]
>> [    4.251579]        CPU0
>> [    4.254031]        ----
>> [    4.256481]   lock(&tz->lock);
>> [    4.259544]   lock(&tz->lock);
>> [    4.262608]
>> [    4.262608]  *** DEADLOCK ***
>> [    4.262608]
>> [    4.268533]  May be due to missing lock nesting notation
>> [    4.268533]
>> [    4.275329] 4 locks held by kworker/0:3/54:
>> [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, at:
>> process_one_work+0x224/0x808
>> [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0}, at:
>> process_one_work+0x224/0x808
>> [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
>> __device_attach+0x30/0x140
>> [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
>> thermal_zone_get_temp+0x38/0x6c
>> [    4.312408]
>> [    4.312408] stack backtrace:
>> [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
>> 5.8.0-rc3-00011-gf5e50bf4d3ef #42
>> [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
>> [    4.330809] Workqueue: events deferred_probe_work_func
>> [    4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
>> (show_stack+0x10/0x14)
>> [    4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
>> (dump_stack+0xe8/0x114)
>> [    4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
>> (__lock_acquire+0xbfc/0x2cb4)
>> [    4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
>> (lock_acquire+0xf4/0x4e4)
>> [    4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
>> (__mutex_lock+0xb0/0xaa8)
>> [    4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
>> (mutex_lock_nested+0x1c/0x24)
>> [    4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
>> (thermal_zone_device_is_enabled+0x18/0x34)
>> [    4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
>> [<c0d9da90>] (imx_get_temp+0x30/0x208)
>> [    4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
>> (thermal_zone_get_temp+0x4c/0x6c)
>> [    4.409640] [<c0d97484>] (thermal_zone_get_temp) from [<c0d93df0>]
>> (thermal_zone_device_update+0x8c/0x258)
>> [    4.419310] [<c0d93df0>] (thermal_zone_device_update) from
>> [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
>> [    4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
>> [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
>> [    4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
>> (platform_drv_probe+0x48/0x98)
>> [    4.447622] [<c0a78388>] (platform_drv_probe) from [<c0a7603c>]
>> (really_probe+0x218/0x348)
>> [    4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
>> (driver_probe_device+0x5c/0xb4)
>> [    4.464098] [<c0a76278>] (driver_probe_device) from [<c0a743bc>]
>> (bus_for_each_drv+0x58/0xb8)
>> [    4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
>> (__device_attach+0xd4/0x140)
>> [    4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
>> (bus_probe_device+0x88/0x90)
>> [    4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
>> (deferred_probe_work_func+0x68/0x98)
>> [    4.498088] [<c0a75564>] (deferred_probe_work_func) from [<c0369988>]
>> (process_one_work+0x2e0/0x808)
>> [    4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
>> (worker_thread+0x2a0/0x59c)
>> [    4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
>> (kthread+0x16c/0x178)
>> [    4.522843] [<c0372208>] (kthread) from [<c0300174>]
>> (ret_from_fork+0x14/0x20)
>> [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
>> [    4.535138] 5fa0:                                     00000000
>> 00000000 00000000 00000000
>> [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
>> 00000000 00000000 00000000
>> [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013
>> 00000000
>>
>>
>>
> 
> Thanks!
> 
> That confirms your suspicions.
> 
> So the reason is that ->get_temp() is called while the mutex is held and
> thermal_zone_device_is_enabled() wants to take the same mutex.

Yes, that's correct.

> Is adding a comment to thermal_zone_device_is_enabled() to never call
> it while the mutex is held and adding another version of it which does
> not take the mutex ok?

The thermal_zone_device_is_enabled() is only used in two places, acpi
and this imx driver, and given:

1. as soon as the mutex is released, there is no guarantee the thermal
zone won't be changed right after, the lock is pointless, thus the
information also.

2. from a design point of view, I don't see why a driver should know if
a thermal zone is disabled or not

It would make sense to end with this function and do not give the
different drivers an opportunity to access this information.

Why not add change_mode for the acpi in order to enable or disable the
events and for imx_thermal use irq_enabled flag instead of the thermal
zone mode? Moreover it is very unclear why this function is needed in
imx_get_temp(), and I suspect we should be able to get rid of it.


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

  reply index

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-29 12:29 Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 01/11] acpi: thermal: Fix error handling in the register function Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 02/11] thermal: Store thermal mode in a dedicated enum Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 03/11] thermal: Add current mode to thermal zone device Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 04/11] thermal: Store device mode in struct thermal_zone_device Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 05/11] thermal: remove get_mode() operation of drivers Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 06/11] thermal: Add mode helpers Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 07/11] thermal: Use mode helpers in drivers Andrzej Pietrasiewicz
2020-06-29 12:45   ` Amit Kucheria
2020-06-29 14:14   ` Bartlomiej Zolnierkiewicz
2020-06-29 12:29 ` [PATCH v7 08/11] thermal: Explicitly enable non-changing thermal zone devices Andrzej Pietrasiewicz
2020-06-29 14:15   ` Bartlomiej Zolnierkiewicz
2020-06-30  5:07   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 09/11] thermal: core: Stop polling DISABLED thermal devices Andrzej Pietrasiewicz
2020-06-29 12:46   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 10/11] thermal: Simplify or eliminate unnecessary set_mode() methods Andrzej Pietrasiewicz
2020-06-29 12:47   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 11/11] thermal: Rename set_mode() to change_mode() Andrzej Pietrasiewicz
2020-06-29 12:48   ` Amit Kucheria
2020-06-30 12:57 ` [PATCH v7 00/11] Stop monitoring disabled devices Daniel Lezcano
2020-06-30 13:43   ` Andrzej Pietrasiewicz
2020-06-30 14:53     ` Daniel Lezcano
2020-06-30 15:29       ` Andrzej Pietrasiewicz
2020-06-30 15:53         ` Daniel Lezcano
2020-06-30 16:56           ` Andrzej Pietrasiewicz
2020-06-30 18:33             ` Daniel Lezcano
2020-07-01 10:23               ` Andrzej Pietrasiewicz
2020-07-01 14:25                 ` Daniel Lezcano
2020-07-02 13:47                 ` Daniel Lezcano
2020-07-02 13:53                   ` Andrzej Pietrasiewicz
2020-07-02 14:58                     ` Daniel Lezcano
2020-07-02 17:01                     ` Daniel Lezcano
2020-07-02 17:19                       ` Andrzej Pietrasiewicz
2020-07-02 17:49                         ` Daniel Lezcano [this message]
2020-07-02 17:52                           ` Daniel Lezcano
2020-07-03  1:49                           ` Zhang Rui
2020-07-03  6:38                             ` Daniel Lezcano
2020-07-03 10:45                               ` Andrzej Pietrasiewicz
2020-07-03 11:05                                 ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d41bf28f-ee91-6946-2334-f11ec81f96fe@linaro.org \
    --to=daniel.lezcano@linaro.org \
    --cc=allison@lohutok.net \
    --cc=amit.kucheria@verdurent.com \
    --cc=andrzej.p@collabora.com \
    --cc=andy@infradead.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=baolin.wang7@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dvhart@infradead.org \
    --cc=emmanuel.grumbach@intel.com \
    --cc=festevam@gmail.com \
    --cc=gayatri.kammela@intel.com \
    --cc=heiko@sntech.de \
    --cc=idosch@mellanox.com \
    --cc=info@metux.net \
    --cc=jiri@mellanox.com \
    --cc=johannes.berg@intel.com \
    --cc=kernel@collabora.com \
    --cc=kernel@pengutronix.de \
    --cc=kvalo@codeaurora.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linuxwifi@intel.com \
    --cc=luciano.coelho@intel.com \
    --cc=miquel.raynal@bootlin.com \
    --cc=netdev@vger.kernel.org \
    --cc=niklas.soderlund@ragnatech.se \
    --cc=orsonzhai@gmail.com \
    --cc=peter@piie.net \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=rui.zhang@intel.com \
    --cc=s.hauer@pengutronix.de \
    --cc=shawnguo@kernel.org \
    --cc=sre@kernel.org \
    --cc=support.opensource@diasemi.com \
    --cc=tglx@linutronix.de \
    --cc=vishal@chelsio.com \
    --cc=zhang.lyra@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ACPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-acpi/0 linux-acpi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-acpi linux-acpi/ https://lore.kernel.org/linux-acpi \
		linux-acpi@vger.kernel.org
	public-inbox-index linux-acpi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-acpi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git