All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhang Rui <rui.zhang@intel.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>,
	Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org,
	netdev@vger.kernel.org, linux-wireless@vger.kernel.org,
	platform-driver-x86@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-renesas-soc@vger.kernel.org,
	linux-rockchip@lists.infradead.org
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>,
	"Len Brown" <lenb@kernel.org>,
	"Vishal Kulkarni" <vishal@chelsio.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Jiri Pirko" <jiri@mellanox.com>,
	"Ido Schimmel" <idosch@mellanox.com>,
	"Johannes Berg" <johannes.berg@intel.com>,
	"Emmanuel Grumbach" <emmanuel.grumbach@intel.com>,
	"Luca Coelho" <luciano.coelho@intel.com>,
	"Intel Linux Wireless" <linuxwifi@intel.com>,
	"Kalle Valo" <kvalo@codeaurora.org>,
	"Peter Kaestle" <peter@piie.net>,
	"Darren Hart" <dvhart@infradead.org>,
	"Andy Shevchenko" <andy@infradead.org>,
	"Sebastian Reichel" <sre@kernel.org>,
	"Miquel Raynal" <miquel.raynal@bootlin.com>,
	"Amit Kucheria" <amit.kucheria@verdurent.com>,
	"Support Opensource" <support.opensource@diasemi.com>,
	"Shawn Guo" <shawnguo@kernel.org>,
	"Sascha Hauer" <s.hauer@pengutronix.de>,
	"Pengutronix Kernel Team" <kernel@pengutronix.de>,
	"Fabio Estevam" <festevam@gmail.com>,
	"NXP Linux Team" <linux-imx@nxp.com>,
	"Niklas Söderlund" <niklas.soderlund@ragnatech.se>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Orson Zhai" <orsonzhai@gmail.com>,
	"Baolin Wang" <baolin.wang7@gmail.com>,
	"Chunyan Zhang" <zhang.lyra@gmail.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Enrico Weigelt" <info@metux.net>,
	"Gayatri Kammela" <gayatri.kammela@intel.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Bartlomiej Zolnierkiewicz" <b.zolnierkie@samsung.com>,
	kernel@collabora.com
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices
Date: Fri, 03 Jul 2020 09:49:15 +0800	[thread overview]
Message-ID: <44c622dd7de8c7bf143c4435c0edd1b98d09a3d6.camel@intel.com> (raw)
In-Reply-To: <d41bf28f-ee91-6946-2334-f11ec81f96fe@linaro.org>

On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote:
> On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote:
> > Hi,
> > 
> > W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
> > > On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
> > > > Hi Daniel,
> > > > 
> > > > <snip>
> > > > 
> > > > > > > > > 
> > > > > > > > > I did reproduce:
> > > > > > > > > 
> > > > > > > > > v5.8-rc3 + series => imx6 hang at boot time
> > > > > > > > > v5.8-rc3 => imx6 boots correctly
> > > > > 
> > > > > So finally I succeeded to reproduce it on my imx7 locally.
> > > > > The sensor
> > > > > was failing to initialize for another reason related to the
> > > > > legacy
> > > > > cooling device, this is why it is not appearing on the imx7.
> > > > > 
> > > > > I can now git-bisect :)
> > > > > 
> > > > 
> > > > That would be very kind of you, thank you!
> > > 
> > > With the lock correctness option enabled:
> > > 
> > > [    4.179223] imx_thermal tempmon: Extended Commercial CPU
> > > temperature
> > > grade - max:105C critical:100C passive:95C
> > > [    4.189557]
> > > [    4.191060] ============================================
> > > [    4.196378] WARNING: possible recursive locking detected
> > > [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
> > > [    4.207102] --------------------------------------------
> > > [    4.212421] kworker/0:3/54 is trying to acquire lock:
> > > [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_device_is_enabled+0x18/0x34
> > > [    4.225777]
> > > [    4.225777] but task is already holding lock:
> > > [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [    4.239121]
> > > [    4.239121] other info that might help us debug this:
> > > [    4.245655]  Possible unsafe locking scenario:
> > > [    4.245655]
> > > [    4.251579]        CPU0
> > > [    4.254031]        ----
> > > [    4.256481]   lock(&tz->lock);
> > > [    4.259544]   lock(&tz->lock);
> > > [    4.262608]
> > > [    4.262608]  *** DEADLOCK ***
> > > [    4.262608]
> > > [    4.268533]  May be due to missing lock nesting notation
> > > [    4.268533]
> > > [    4.275329] 4 locks held by kworker/0:3/54:
> > > [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, 
> > > at:
> > > process_one_work+0x224/0x808
> > > [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0},
> > > at:
> > > process_one_work+0x224/0x808
> > > [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
> > > __device_attach+0x30/0x140
> > > [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [    4.312408]
> > > [    4.312408] stack backtrace:
> > > [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
> > > 5.8.0-rc3-00011-gf5e50bf4d3ef #42
> > > [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
> > > [    4.330809] Workqueue: events deferred_probe_work_func
> > > [    4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
> > > (show_stack+0x10/0x14)
> > > [    4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
> > > (dump_stack+0xe8/0x114)
> > > [    4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
> > > (__lock_acquire+0xbfc/0x2cb4)
> > > [    4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
> > > (lock_acquire+0xf4/0x4e4)
> > > [    4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
> > > (__mutex_lock+0xb0/0xaa8)
> > > [    4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
> > > (mutex_lock_nested+0x1c/0x24)
> > > [    4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
> > > (thermal_zone_device_is_enabled+0x18/0x34)
> > > [    4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
> > > [<c0d9da90>] (imx_get_temp+0x30/0x208)
> > > [    4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
> > > (thermal_zone_get_temp+0x4c/0x6c)
> > > [    4.409640] [<c0d97484>] (thermal_zone_get_temp) from
> > > [<c0d93df0>]
> > > (thermal_zone_device_update+0x8c/0x258)
> > > [    4.419310] [<c0d93df0>] (thermal_zone_device_update) from
> > > [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
> > > [    4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
> > > [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
> > > [    4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
> > > (platform_drv_probe+0x48/0x98)
> > > [    4.447622] [<c0a78388>] (platform_drv_probe) from
> > > [<c0a7603c>]
> > > (really_probe+0x218/0x348)
> > > [    4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
> > > (driver_probe_device+0x5c/0xb4)
> > > [    4.464098] [<c0a76278>] (driver_probe_device) from
> > > [<c0a743bc>]
> > > (bus_for_each_drv+0x58/0xb8)
> > > [    4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
> > > (__device_attach+0xd4/0x140)
> > > [    4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
> > > (bus_probe_device+0x88/0x90)
> > > [    4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
> > > (deferred_probe_work_func+0x68/0x98)
> > > [    4.498088] [<c0a75564>] (deferred_probe_work_func) from
> > > [<c0369988>]
> > > (process_one_work+0x2e0/0x808)
> > > [    4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
> > > (worker_thread+0x2a0/0x59c)
> > > [    4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
> > > (kthread+0x16c/0x178)
> > > [    4.522843] [<c0372208>] (kthread) from [<c0300174>]
> > > (ret_from_fork+0x14/0x20)
> > > [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
> > > [    4.535138] 5fa0:                                     00000000
> > > 00000000 00000000 00000000
> > > [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000 00000000
> > > [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013
> > > 00000000
> > > 
> > > 
> > > 
> > 
> > Thanks!
> > 
> > That confirms your suspicions.
> > 
> > So the reason is that ->get_temp() is called while the mutex is
> > held and
> > thermal_zone_device_is_enabled() wants to take the same mutex.
> 
> Yes, that's correct.
> 
> > Is adding a comment to thermal_zone_device_is_enabled() to never
> > call
> > it while the mutex is held and adding another version of it which
> > does
> > not take the mutex ok?
> 
> The thermal_zone_device_is_enabled() is only used in two places, acpi
> and this imx driver, and given:
> 
> 1. as soon as the mutex is released, there is no guarantee the
> thermal
> zone won't be changed right after, the lock is pointless, thus the
> information also.
> 
> 2. from a design point of view, I don't see why a driver should know
> if
> a thermal zone is disabled or not
> 
> It would make sense to end with this function and do not give the
> different drivers an opportunity to access this information.

I agree.
> 
> Why not add change_mode for the acpi in order to enable or disable
> the
> events

thermal_zone_device_is_enabled() is invoked in acpi thermal driver
because we only want to do thermal_zone_device_update() when the acpi
thermal zone is enabled.

As thermal_zone_device_update() can handle a disabled thermal zone now,
we can just remove the check.

thanks,
rui

>  and for imx_thermal use irq_enabled flag instead of the thermal
> zone mode? Moreover it is very unclear why this function is needed in
> imx_get_temp(), and I suspect we should be able to get rid of it.
> 
> 


WARNING: multiple messages have this Message-ID (diff)
From: Zhang Rui <rui.zhang@intel.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>,
	Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org,
	netdev@vger.kernel.org, linux-wireless@vger.kernel.org,
	platform-driver-x86@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-renesas-soc@vger.kernel.org,
	linux-rockchip@lists.infradead.org
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Len Brown <lenb@kernel.org>, Vishal Kulkarni <vishal@chelsio.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jiri Pirko <jiri@mellanox.com>,
	Ido Schimmel <idosch@mellanox.com>,
	Johannes Berg <johannes.berg@intel.com>,
	Emmanuel Grumbach <emmanuel.grumbach@intel.com>,
	Luca Coelho <luciano.coelho@intel.com>,
	Intel Linux Wireless <linuxwifi@intel.com>,
	Kalle Valo <kvalo@codeaurora.org>, Peter Kaestle <peter@piie.net>,
	Darren Hart <dvhart@infradead.org>,
	Andy Shevchenko <andy@infradead.org>,
	Sebastian Reichel <sre@kernel.org>,
	Miquel Raynal <miquel.raynal@bootlin.com>,
	Amit Kucheria <amit.kucheria@verdurent.com>,
	Support Opensource <support.opensource@diasemi.com>,
	Shawn Guo <shawnguo@kernel.org>
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices
Date: Fri, 03 Jul 2020 09:49:15 +0800	[thread overview]
Message-ID: <44c622dd7de8c7bf143c4435c0edd1b98d09a3d6.camel@intel.com> (raw)
In-Reply-To: <d41bf28f-ee91-6946-2334-f11ec81f96fe@linaro.org>

On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote:
> On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote:
> > Hi,
> > 
> > W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
> > > On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
> > > > Hi Daniel,
> > > > 
> > > > <snip>
> > > > 
> > > > > > > > > 
> > > > > > > > > I did reproduce:
> > > > > > > > > 
> > > > > > > > > v5.8-rc3 + series => imx6 hang at boot time
> > > > > > > > > v5.8-rc3 => imx6 boots correctly
> > > > > 
> > > > > So finally I succeeded to reproduce it on my imx7 locally.
> > > > > The sensor
> > > > > was failing to initialize for another reason related to the
> > > > > legacy
> > > > > cooling device, this is why it is not appearing on the imx7.
> > > > > 
> > > > > I can now git-bisect :)
> > > > > 
> > > > 
> > > > That would be very kind of you, thank you!
> > > 
> > > With the lock correctness option enabled:
> > > 
> > > [    4.179223] imx_thermal tempmon: Extended Commercial CPU
> > > temperature
> > > grade - max:105C critical:100C passive:95C
> > > [    4.189557]
> > > [    4.191060] ============================================
> > > [    4.196378] WARNING: possible recursive locking detected
> > > [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
> > > [    4.207102] --------------------------------------------
> > > [    4.212421] kworker/0:3/54 is trying to acquire lock:
> > > [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_device_is_enabled+0x18/0x34
> > > [    4.225777]
> > > [    4.225777] but task is already holding lock:
> > > [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [    4.239121]
> > > [    4.239121] other info that might help us debug this:
> > > [    4.245655]  Possible unsafe locking scenario:
> > > [    4.245655]
> > > [    4.251579]        CPU0
> > > [    4.254031]        ----
> > > [    4.256481]   lock(&tz->lock);
> > > [    4.259544]   lock(&tz->lock);
> > > [    4.262608]
> > > [    4.262608]  *** DEADLOCK ***
> > > [    4.262608]
> > > [    4.268533]  May be due to missing lock nesting notation
> > > [    4.268533]
> > > [    4.275329] 4 locks held by kworker/0:3/54:
> > > [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, 
> > > at:
> > > process_one_work+0x224/0x808
> > > [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0},
> > > at:
> > > process_one_work+0x224/0x808
> > > [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
> > > __device_attach+0x30/0x140
> > > [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [    4.312408]
> > > [    4.312408] stack backtrace:
> > > [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
> > > 5.8.0-rc3-00011-gf5e50bf4d3ef #42
> > > [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
> > > [    4.330809] Workqueue: events deferred_probe_work_func
> > > [    4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
> > > (show_stack+0x10/0x14)
> > > [    4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
> > > (dump_stack+0xe8/0x114)
> > > [    4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
> > > (__lock_acquire+0xbfc/0x2cb4)
> > > [    4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
> > > (lock_acquire+0xf4/0x4e4)
> > > [    4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
> > > (__mutex_lock+0xb0/0xaa8)
> > > [    4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
> > > (mutex_lock_nested+0x1c/0x24)
> > > [    4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
> > > (thermal_zone_device_is_enabled+0x18/0x34)
> > > [    4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
> > > [<c0d9da90>] (imx_get_temp+0x30/0x208)
> > > [    4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
> > > (thermal_zone_get_temp+0x4c/0x6c)
> > > [    4.409640] [<c0d97484>] (thermal_zone_get_temp) from
> > > [<c0d93df0>]
> > > (thermal_zone_device_update+0x8c/0x258)
> > > [    4.419310] [<c0d93df0>] (thermal_zone_device_update) from
> > > [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
> > > [    4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
> > > [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
> > > [    4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
> > > (platform_drv_probe+0x48/0x98)
> > > [    4.447622] [<c0a78388>] (platform_drv_probe) from
> > > [<c0a7603c>]
> > > (really_probe+0x218/0x348)
> > > [    4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
> > > (driver_probe_device+0x5c/0xb4)
> > > [    4.464098] [<c0a76278>] (driver_probe_device) from
> > > [<c0a743bc>]
> > > (bus_for_each_drv+0x58/0xb8)
> > > [    4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
> > > (__device_attach+0xd4/0x140)
> > > [    4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
> > > (bus_probe_device+0x88/0x90)
> > > [    4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
> > > (deferred_probe_work_func+0x68/0x98)
> > > [    4.498088] [<c0a75564>] (deferred_probe_work_func) from
> > > [<c0369988>]
> > > (process_one_work+0x2e0/0x808)
> > > [    4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
> > > (worker_thread+0x2a0/0x59c)
> > > [    4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
> > > (kthread+0x16c/0x178)
> > > [    4.522843] [<c0372208>] (kthread) from [<c0300174>]
> > > (ret_from_fork+0x14/0x20)
> > > [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
> > > [    4.535138] 5fa0:                                     00000000
> > > 00000000 00000000 00000000
> > > [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000 00000000
> > > [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013
> > > 00000000
> > > 
> > > 
> > > 
> > 
> > Thanks!
> > 
> > That confirms your suspicions.
> > 
> > So the reason is that ->get_temp() is called while the mutex is
> > held and
> > thermal_zone_device_is_enabled() wants to take the same mutex.
> 
> Yes, that's correct.
> 
> > Is adding a comment to thermal_zone_device_is_enabled() to never
> > call
> > it while the mutex is held and adding another version of it which
> > does
> > not take the mutex ok?
> 
> The thermal_zone_device_is_enabled() is only used in two places, acpi
> and this imx driver, and given:
> 
> 1. as soon as the mutex is released, there is no guarantee the
> thermal
> zone won't be changed right after, the lock is pointless, thus the
> information also.
> 
> 2. from a design point of view, I don't see why a driver should know
> if
> a thermal zone is disabled or not
> 
> It would make sense to end with this function and do not give the
> different drivers an opportunity to access this information.

I agree.
> 
> Why not add change_mode for the acpi in order to enable or disable
> the
> events

thermal_zone_device_is_enabled() is invoked in acpi thermal driver
because we only want to do thermal_zone_device_update() when the acpi
thermal zone is enabled.

As thermal_zone_device_update() can handle a disabled thermal zone now,
we can just remove the check.

thanks,
rui

>  and for imx_thermal use irq_enabled flag instead of the thermal
> zone mode? Moreover it is very unclear why this function is needed in
> imx_get_temp(), and I suspect we should be able to get rid of it.
> 
> 

WARNING: multiple messages have this Message-ID (diff)
From: Zhang Rui <rui.zhang@intel.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>,
	Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org,
	 netdev@vger.kernel.org, linux-wireless@vger.kernel.org,
	 platform-driver-x86@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	 linux-renesas-soc@vger.kernel.org,
	linux-rockchip@lists.infradead.org
Cc: "Emmanuel Grumbach" <emmanuel.grumbach@intel.com>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	"Vishal Kulkarni" <vishal@chelsio.com>,
	"Luca Coelho" <luciano.coelho@intel.com>,
	"Miquel Raynal" <miquel.raynal@bootlin.com>,
	kernel@collabora.com, "Fabio Estevam" <festevam@gmail.com>,
	"Amit Kucheria" <amit.kucheria@verdurent.com>,
	"Chunyan Zhang" <zhang.lyra@gmail.com>,
	"Bartlomiej Zolnierkiewicz" <b.zolnierkie@samsung.com>,
	"Allison Randal" <allison@lohutok.net>,
	"NXP Linux Team" <linux-imx@nxp.com>,
	"Darren Hart" <dvhart@infradead.org>,
	"Gayatri Kammela" <gayatri.kammela@intel.com>,
	"Len Brown" <lenb@kernel.org>,
	"Johannes Berg" <johannes.berg@intel.com>,
	"Intel Linux Wireless" <linuxwifi@intel.com>,
	"Sascha Hauer" <s.hauer@pengutronix.de>,
	"Ido Schimmel" <idosch@mellanox.com>,
	"Baolin Wang" <baolin.wang7@gmail.com>,
	"Jiri Pirko" <jiri@mellanox.com>,
	"Orson Zhai" <orsonzhai@gmail.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Kalle Valo" <kvalo@codeaurora.org>,
	"Support Opensource" <support.opensource@diasemi.com>,
	"Enrico Weigelt" <info@metux.net>,
	"Peter Kaestle" <peter@piie.net>,
	"Sebastian Reichel" <sre@kernel.org>,
	"Pengutronix Kernel Team" <kernel@pengutronix.de>,
	"Niklas Söderlund" <niklas.soderlund@ragnatech.se>,
	"Shawn Guo" <shawnguo@kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	"Andy Shevchenko" <andy@infradead.org>
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices
Date: Fri, 03 Jul 2020 09:49:15 +0800	[thread overview]
Message-ID: <44c622dd7de8c7bf143c4435c0edd1b98d09a3d6.camel@intel.com> (raw)
In-Reply-To: <d41bf28f-ee91-6946-2334-f11ec81f96fe@linaro.org>

On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote:
> On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote:
> > Hi,
> > 
> > W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
> > > On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
> > > > Hi Daniel,
> > > > 
> > > > <snip>
> > > > 
> > > > > > > > > 
> > > > > > > > > I did reproduce:
> > > > > > > > > 
> > > > > > > > > v5.8-rc3 + series => imx6 hang at boot time
> > > > > > > > > v5.8-rc3 => imx6 boots correctly
> > > > > 
> > > > > So finally I succeeded to reproduce it on my imx7 locally.
> > > > > The sensor
> > > > > was failing to initialize for another reason related to the
> > > > > legacy
> > > > > cooling device, this is why it is not appearing on the imx7.
> > > > > 
> > > > > I can now git-bisect :)
> > > > > 
> > > > 
> > > > That would be very kind of you, thank you!
> > > 
> > > With the lock correctness option enabled:
> > > 
> > > [    4.179223] imx_thermal tempmon: Extended Commercial CPU
> > > temperature
> > > grade - max:105C critical:100C passive:95C
> > > [    4.189557]
> > > [    4.191060] ============================================
> > > [    4.196378] WARNING: possible recursive locking detected
> > > [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
> > > [    4.207102] --------------------------------------------
> > > [    4.212421] kworker/0:3/54 is trying to acquire lock:
> > > [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_device_is_enabled+0x18/0x34
> > > [    4.225777]
> > > [    4.225777] but task is already holding lock:
> > > [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [    4.239121]
> > > [    4.239121] other info that might help us debug this:
> > > [    4.245655]  Possible unsafe locking scenario:
> > > [    4.245655]
> > > [    4.251579]        CPU0
> > > [    4.254031]        ----
> > > [    4.256481]   lock(&tz->lock);
> > > [    4.259544]   lock(&tz->lock);
> > > [    4.262608]
> > > [    4.262608]  *** DEADLOCK ***
> > > [    4.262608]
> > > [    4.268533]  May be due to missing lock nesting notation
> > > [    4.268533]
> > > [    4.275329] 4 locks held by kworker/0:3/54:
> > > [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, 
> > > at:
> > > process_one_work+0x224/0x808
> > > [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0},
> > > at:
> > > process_one_work+0x224/0x808
> > > [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
> > > __device_attach+0x30/0x140
> > > [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [    4.312408]
> > > [    4.312408] stack backtrace:
> > > [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
> > > 5.8.0-rc3-00011-gf5e50bf4d3ef #42
> > > [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
> > > [    4.330809] Workqueue: events deferred_probe_work_func
> > > [    4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
> > > (show_stack+0x10/0x14)
> > > [    4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
> > > (dump_stack+0xe8/0x114)
> > > [    4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
> > > (__lock_acquire+0xbfc/0x2cb4)
> > > [    4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
> > > (lock_acquire+0xf4/0x4e4)
> > > [    4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
> > > (__mutex_lock+0xb0/0xaa8)
> > > [    4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
> > > (mutex_lock_nested+0x1c/0x24)
> > > [    4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
> > > (thermal_zone_device_is_enabled+0x18/0x34)
> > > [    4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
> > > [<c0d9da90>] (imx_get_temp+0x30/0x208)
> > > [    4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
> > > (thermal_zone_get_temp+0x4c/0x6c)
> > > [    4.409640] [<c0d97484>] (thermal_zone_get_temp) from
> > > [<c0d93df0>]
> > > (thermal_zone_device_update+0x8c/0x258)
> > > [    4.419310] [<c0d93df0>] (thermal_zone_device_update) from
> > > [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
> > > [    4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
> > > [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
> > > [    4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
> > > (platform_drv_probe+0x48/0x98)
> > > [    4.447622] [<c0a78388>] (platform_drv_probe) from
> > > [<c0a7603c>]
> > > (really_probe+0x218/0x348)
> > > [    4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
> > > (driver_probe_device+0x5c/0xb4)
> > > [    4.464098] [<c0a76278>] (driver_probe_device) from
> > > [<c0a743bc>]
> > > (bus_for_each_drv+0x58/0xb8)
> > > [    4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
> > > (__device_attach+0xd4/0x140)
> > > [    4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
> > > (bus_probe_device+0x88/0x90)
> > > [    4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
> > > (deferred_probe_work_func+0x68/0x98)
> > > [    4.498088] [<c0a75564>] (deferred_probe_work_func) from
> > > [<c0369988>]
> > > (process_one_work+0x2e0/0x808)
> > > [    4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
> > > (worker_thread+0x2a0/0x59c)
> > > [    4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
> > > (kthread+0x16c/0x178)
> > > [    4.522843] [<c0372208>] (kthread) from [<c0300174>]
> > > (ret_from_fork+0x14/0x20)
> > > [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
> > > [    4.535138] 5fa0:                                     00000000
> > > 00000000 00000000 00000000
> > > [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000 00000000
> > > [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013
> > > 00000000
> > > 
> > > 
> > > 
> > 
> > Thanks!
> > 
> > That confirms your suspicions.
> > 
> > So the reason is that ->get_temp() is called while the mutex is
> > held and
> > thermal_zone_device_is_enabled() wants to take the same mutex.
> 
> Yes, that's correct.
> 
> > Is adding a comment to thermal_zone_device_is_enabled() to never
> > call
> > it while the mutex is held and adding another version of it which
> > does
> > not take the mutex ok?
> 
> The thermal_zone_device_is_enabled() is only used in two places, acpi
> and this imx driver, and given:
> 
> 1. as soon as the mutex is released, there is no guarantee the
> thermal
> zone won't be changed right after, the lock is pointless, thus the
> information also.
> 
> 2. from a design point of view, I don't see why a driver should know
> if
> a thermal zone is disabled or not
> 
> It would make sense to end with this function and do not give the
> different drivers an opportunity to access this information.

I agree.
> 
> Why not add change_mode for the acpi in order to enable or disable
> the
> events

thermal_zone_device_is_enabled() is invoked in acpi thermal driver
because we only want to do thermal_zone_device_update() when the acpi
thermal zone is enabled.

As thermal_zone_device_update() can handle a disabled thermal zone now,
we can just remove the check.

thanks,
rui

>  and for imx_thermal use irq_enabled flag instead of the thermal
> zone mode? Moreover it is very unclear why this function is needed in
> imx_get_temp(), and I suspect we should be able to get rid of it.
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-07-03  1:49 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-29 12:29 [PATCH v7 00/11] Stop monitoring disabled devices Andrzej Pietrasiewicz
2020-06-29 12:29 ` Andrzej Pietrasiewicz
2020-06-29 12:29 ` Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 01/11] acpi: thermal: Fix error handling in the register function Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 02/11] thermal: Store thermal mode in a dedicated enum Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 03/11] thermal: Add current mode to thermal zone device Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 04/11] thermal: Store device mode in struct thermal_zone_device Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 05/11] thermal: remove get_mode() operation of drivers Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 06/11] thermal: Add mode helpers Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 07/11] thermal: Use mode helpers in drivers Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:45   ` Amit Kucheria
2020-06-29 12:45     ` Amit Kucheria
2020-06-29 12:45     ` Amit Kucheria
2020-06-29 14:14   ` Bartlomiej Zolnierkiewicz
2020-06-29 14:14     ` Bartlomiej Zolnierkiewicz
2020-06-29 14:14     ` Bartlomiej Zolnierkiewicz
2020-06-29 12:29 ` [PATCH v7 08/11] thermal: Explicitly enable non-changing thermal zone devices Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 14:15   ` Bartlomiej Zolnierkiewicz
2020-06-29 14:15     ` Bartlomiej Zolnierkiewicz
2020-06-29 14:15     ` Bartlomiej Zolnierkiewicz
2020-06-30  5:07   ` Amit Kucheria
2020-06-30  5:07     ` Amit Kucheria
2020-06-30  5:07     ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 09/11] thermal: core: Stop polling DISABLED thermal devices Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:46   ` Amit Kucheria
2020-06-29 12:46     ` Amit Kucheria
2020-06-29 12:46     ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 10/11] thermal: Simplify or eliminate unnecessary set_mode() methods Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:47   ` Amit Kucheria
2020-06-29 12:47     ` Amit Kucheria
2020-06-29 12:47     ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 11/11] thermal: Rename set_mode() to change_mode() Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:29   ` Andrzej Pietrasiewicz
2020-06-29 12:48   ` Amit Kucheria
2020-06-29 12:48     ` Amit Kucheria
2020-06-29 12:48     ` Amit Kucheria
2020-06-30 12:57 ` [PATCH v7 00/11] Stop monitoring disabled devices Daniel Lezcano
2020-06-30 12:57   ` Daniel Lezcano
2020-06-30 12:57   ` Daniel Lezcano
2020-06-30 13:43   ` Andrzej Pietrasiewicz
2020-06-30 13:43     ` Andrzej Pietrasiewicz
2020-06-30 13:43     ` Andrzej Pietrasiewicz
2020-06-30 14:53     ` Daniel Lezcano
2020-06-30 14:53       ` Daniel Lezcano
2020-06-30 14:53       ` Daniel Lezcano
2020-06-30 15:29       ` Andrzej Pietrasiewicz
2020-06-30 15:29         ` Andrzej Pietrasiewicz
2020-06-30 15:29         ` Andrzej Pietrasiewicz
2020-06-30 15:53         ` Daniel Lezcano
2020-06-30 15:53           ` Daniel Lezcano
2020-06-30 15:53           ` Daniel Lezcano
2020-06-30 16:56           ` Andrzej Pietrasiewicz
2020-06-30 16:56             ` Andrzej Pietrasiewicz
2020-06-30 16:56             ` Andrzej Pietrasiewicz
2020-06-30 18:33             ` Daniel Lezcano
2020-06-30 18:33               ` Daniel Lezcano
2020-06-30 18:33               ` Daniel Lezcano
2020-07-01 10:23               ` Andrzej Pietrasiewicz
2020-07-01 10:23                 ` Andrzej Pietrasiewicz
2020-07-01 10:23                 ` Andrzej Pietrasiewicz
2020-07-01 14:25                 ` Daniel Lezcano
2020-07-01 14:25                   ` Daniel Lezcano
2020-07-01 14:25                   ` Daniel Lezcano
2020-07-02 13:47                 ` Daniel Lezcano
2020-07-02 13:47                   ` Daniel Lezcano
2020-07-02 13:47                   ` Daniel Lezcano
2020-07-02 13:53                   ` Andrzej Pietrasiewicz
2020-07-02 13:53                     ` Andrzej Pietrasiewicz
2020-07-02 13:53                     ` Andrzej Pietrasiewicz
2020-07-02 14:58                     ` Daniel Lezcano
2020-07-02 14:58                       ` Daniel Lezcano
2020-07-02 14:58                       ` Daniel Lezcano
2020-07-02 17:01                     ` Daniel Lezcano
2020-07-02 17:01                       ` Daniel Lezcano
2020-07-02 17:01                       ` Daniel Lezcano
2020-07-02 17:19                       ` Andrzej Pietrasiewicz
2020-07-02 17:19                         ` Andrzej Pietrasiewicz
2020-07-02 17:19                         ` Andrzej Pietrasiewicz
2020-07-02 17:49                         ` Daniel Lezcano
2020-07-02 17:49                           ` Daniel Lezcano
2020-07-02 17:49                           ` Daniel Lezcano
2020-07-02 17:52                           ` Daniel Lezcano
2020-07-02 17:52                             ` Daniel Lezcano
2020-07-02 17:52                             ` Daniel Lezcano
2020-07-03  1:49                           ` Zhang Rui [this message]
2020-07-03  1:49                             ` Zhang Rui
2020-07-03  1:49                             ` Zhang Rui
2020-07-03  6:38                             ` Daniel Lezcano
2020-07-03  6:38                               ` Daniel Lezcano
2020-07-03  6:38                               ` Daniel Lezcano
2020-07-03 10:45                               ` Andrzej Pietrasiewicz
2020-07-03 10:45                                 ` Andrzej Pietrasiewicz
2020-07-03 10:45                                 ` Andrzej Pietrasiewicz
2020-07-03 11:05                                 ` Daniel Lezcano
2020-07-03 11:05                                   ` Daniel Lezcano
2020-07-03 11:05                                   ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44c622dd7de8c7bf143c4435c0edd1b98d09a3d6.camel@intel.com \
    --to=rui.zhang@intel.com \
    --cc=allison@lohutok.net \
    --cc=amit.kucheria@verdurent.com \
    --cc=andrzej.p@collabora.com \
    --cc=andy@infradead.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=baolin.wang7@gmail.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=davem@davemloft.net \
    --cc=dvhart@infradead.org \
    --cc=emmanuel.grumbach@intel.com \
    --cc=festevam@gmail.com \
    --cc=gayatri.kammela@intel.com \
    --cc=heiko@sntech.de \
    --cc=idosch@mellanox.com \
    --cc=info@metux.net \
    --cc=jiri@mellanox.com \
    --cc=johannes.berg@intel.com \
    --cc=kernel@collabora.com \
    --cc=kernel@pengutronix.de \
    --cc=kvalo@codeaurora.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linuxwifi@intel.com \
    --cc=luciano.coelho@intel.com \
    --cc=miquel.raynal@bootlin.com \
    --cc=netdev@vger.kernel.org \
    --cc=niklas.soderlund@ragnatech.se \
    --cc=orsonzhai@gmail.com \
    --cc=peter@piie.net \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=s.hauer@pengutronix.de \
    --cc=shawnguo@kernel.org \
    --cc=sre@kernel.org \
    --cc=support.opensource@diasemi.com \
    --cc=tglx@linutronix.de \
    --cc=vishal@chelsio.com \
    --cc=zhang.lyra@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.