Linux-Renesas-SoC Archive on lore.kernel.org
 help / color / Atom feed
From: Zhang Rui <rui.zhang@intel.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>,
	Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	linux-pm@vger.kernel.org, linux-acpi@vger.kernel.org,
	netdev@vger.kernel.org, linux-wireless@vger.kernel.org,
	platform-driver-x86@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-renesas-soc@vger.kernel.org,
	linux-rockchip@lists.infradead.org
Cc: "Rafael J . Wysocki" <rjw@rjwysocki.net>,
	"Len Brown" <lenb@kernel.org>,
	"Vishal Kulkarni" <vishal@chelsio.com>,
	"David S . Miller" <davem@davemloft.net>,
	"Jiri Pirko" <jiri@mellanox.com>,
	"Ido Schimmel" <idosch@mellanox.com>,
	"Johannes Berg" <johannes.berg@intel.com>,
	"Emmanuel Grumbach" <emmanuel.grumbach@intel.com>,
	"Luca Coelho" <luciano.coelho@intel.com>,
	"Intel Linux Wireless" <linuxwifi@intel.com>,
	"Kalle Valo" <kvalo@codeaurora.org>,
	"Peter Kaestle" <peter@piie.net>,
	"Darren Hart" <dvhart@infradead.org>,
	"Andy Shevchenko" <andy@infradead.org>,
	"Sebastian Reichel" <sre@kernel.org>,
	"Miquel Raynal" <miquel.raynal@bootlin.com>,
	"Amit Kucheria" <amit.kucheria@verdurent.com>,
	"Support Opensource" <support.opensource@diasemi.com>,
	"Shawn Guo" <shawnguo@kernel.org>,
	"Sascha Hauer" <s.hauer@pengutronix.de>,
	"Pengutronix Kernel Team" <kernel@pengutronix.de>,
	"Fabio Estevam" <festevam@gmail.com>,
	"NXP Linux Team" <linux-imx@nxp.com>,
	"Niklas Söderlund" <niklas.soderlund@ragnatech.se>,
	"Heiko Stuebner" <heiko@sntech.de>,
	"Orson Zhai" <orsonzhai@gmail.com>,
	"Baolin Wang" <baolin.wang7@gmail.com>,
	"Chunyan Zhang" <zhang.lyra@gmail.com>,
	"Allison Randal" <allison@lohutok.net>,
	"Enrico Weigelt" <info@metux.net>,
	"Gayatri Kammela" <gayatri.kammela@intel.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Bartlomiej Zolnierkiewicz" <b.zolnierkie@samsung.com>,
	kernel@collabora.com
Subject: Re: [PATCH v7 00/11] Stop monitoring disabled devices
Date: Fri, 03 Jul 2020 09:49:15 +0800
Message-ID: <44c622dd7de8c7bf143c4435c0edd1b98d09a3d6.camel@intel.com> (raw)
In-Reply-To: <d41bf28f-ee91-6946-2334-f11ec81f96fe@linaro.org>

On Thu, 2020-07-02 at 19:49 +0200, Daniel Lezcano wrote:
> On 02/07/2020 19:19, Andrzej Pietrasiewicz wrote:
> > Hi,
> > 
> > W dniu 02.07.2020 o 19:01, Daniel Lezcano pisze:
> > > On 02/07/2020 15:53, Andrzej Pietrasiewicz wrote:
> > > > Hi Daniel,
> > > > 
> > > > <snip>
> > > > 
> > > > > > > > > 
> > > > > > > > > I did reproduce:
> > > > > > > > > 
> > > > > > > > > v5.8-rc3 + series => imx6 hang at boot time
> > > > > > > > > v5.8-rc3 => imx6 boots correctly
> > > > > 
> > > > > So finally I succeeded to reproduce it on my imx7 locally.
> > > > > The sensor
> > > > > was failing to initialize for another reason related to the
> > > > > legacy
> > > > > cooling device, this is why it is not appearing on the imx7.
> > > > > 
> > > > > I can now git-bisect :)
> > > > > 
> > > > 
> > > > That would be very kind of you, thank you!
> > > 
> > > With the lock correctness option enabled:
> > > 
> > > [    4.179223] imx_thermal tempmon: Extended Commercial CPU
> > > temperature
> > > grade - max:105C critical:100C passive:95C
> > > [    4.189557]
> > > [    4.191060] ============================================
> > > [    4.196378] WARNING: possible recursive locking detected
> > > [    4.201699] 5.8.0-rc3-00011-gf5e50bf4d3ef #42 Not tainted
> > > [    4.207102] --------------------------------------------
> > > [    4.212421] kworker/0:3/54 is trying to acquire lock:
> > > [    4.217480] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_device_is_enabled+0x18/0x34
> > > [    4.225777]
> > > [    4.225777] but task is already holding lock:
> > > [    4.231615] ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [    4.239121]
> > > [    4.239121] other info that might help us debug this:
> > > [    4.245655]  Possible unsafe locking scenario:
> > > [    4.245655]
> > > [    4.251579]        CPU0
> > > [    4.254031]        ----
> > > [    4.256481]   lock(&tz->lock);
> > > [    4.259544]   lock(&tz->lock);
> > > [    4.262608]
> > > [    4.262608]  *** DEADLOCK ***
> > > [    4.262608]
> > > [    4.268533]  May be due to missing lock nesting notation
> > > [    4.268533]
> > > [    4.275329] 4 locks held by kworker/0:3/54:
> > > [    4.279517]  #0: cb0066a8 ((wq_completion)events){+.+.}-{0:0}, 
> > > at:
> > > process_one_work+0x224/0x808
> > > [    4.288241]  #1: ca075f10 (deferred_probe_work){+.+.}-{0:0},
> > > at:
> > > process_one_work+0x224/0x808
> > > [    4.296787]  #2: cb1a48d8 (&dev->mutex){....}-{3:3}, at:
> > > __device_attach+0x30/0x140
> > > [    4.304468]  #3: ca09a3e4 (&tz->lock){+.+.}-{3:3}, at:
> > > thermal_zone_get_temp+0x38/0x6c
> > > [    4.312408]
> > > [    4.312408] stack backtrace:
> > > [    4.316778] CPU: 0 PID: 54 Comm: kworker/0:3 Not tainted
> > > 5.8.0-rc3-00011-gf5e50bf4d3ef #42
> > > [    4.325048] Hardware name: Freescale i.MX7 Dual (Device Tree)
> > > [    4.330809] Workqueue: events deferred_probe_work_func
> > > [    4.335973] [<c0312384>] (unwind_backtrace) from [<c030c580>]
> > > (show_stack+0x10/0x14)
> > > [    4.343734] [<c030c580>] (show_stack) from [<c079d7d8>]
> > > (dump_stack+0xe8/0x114)
> > > [    4.351062] [<c079d7d8>] (dump_stack) from [<c03abf78>]
> > > (__lock_acquire+0xbfc/0x2cb4)
> > > [    4.358909] [<c03abf78>] (__lock_acquire) from [<c03ae9c4>]
> > > (lock_acquire+0xf4/0x4e4)
> > > [    4.366758] [<c03ae9c4>] (lock_acquire) from [<c10630fc>]
> > > (__mutex_lock+0xb0/0xaa8)
> > > [    4.374431] [<c10630fc>] (__mutex_lock) from [<c1063b10>]
> > > (mutex_lock_nested+0x1c/0x24)
> > > [    4.382452] [<c1063b10>] (mutex_lock_nested) from [<c0d932c0>]
> > > (thermal_zone_device_is_enabled+0x18/0x34)
> > > [    4.392036] [<c0d932c0>] (thermal_zone_device_is_enabled) from
> > > [<c0d9da90>] (imx_get_temp+0x30/0x208)
> > > [    4.401271] [<c0d9da90>] (imx_get_temp) from [<c0d97484>]
> > > (thermal_zone_get_temp+0x4c/0x6c)
> > > [    4.409640] [<c0d97484>] (thermal_zone_get_temp) from
> > > [<c0d93df0>]
> > > (thermal_zone_device_update+0x8c/0x258)
> > > [    4.419310] [<c0d93df0>] (thermal_zone_device_update) from
> > > [<c0d9401c>] (thermal_zone_device_set_mode+0x60/0x88)
> > > [    4.429500] [<c0d9401c>] (thermal_zone_device_set_mode) from
> > > [<c0d9e1d4>] (imx_thermal_probe+0x3e4/0x578)
> > > [    4.439082] [<c0d9e1d4>] (imx_thermal_probe) from [<c0a78388>]
> > > (platform_drv_probe+0x48/0x98)
> > > [    4.447622] [<c0a78388>] (platform_drv_probe) from
> > > [<c0a7603c>]
> > > (really_probe+0x218/0x348)
> > > [    4.455903] [<c0a7603c>] (really_probe) from [<c0a76278>]
> > > (driver_probe_device+0x5c/0xb4)
> > > [    4.464098] [<c0a76278>] (driver_probe_device) from
> > > [<c0a743bc>]
> > > (bus_for_each_drv+0x58/0xb8)
> > > [    4.472638] [<c0a743bc>] (bus_for_each_drv) from [<c0a75db0>]
> > > (__device_attach+0xd4/0x140)
> > > [    4.480919] [<c0a75db0>] (__device_attach) from [<c0a750b0>]
> > > (bus_probe_device+0x88/0x90)
> > > [    4.489112] [<c0a750b0>] (bus_probe_device) from [<c0a75564>]
> > > (deferred_probe_work_func+0x68/0x98)
> > > [    4.498088] [<c0a75564>] (deferred_probe_work_func) from
> > > [<c0369988>]
> > > (process_one_work+0x2e0/0x808)
> > > [    4.507237] [<c0369988>] (process_one_work) from [<c036a150>]
> > > (worker_thread+0x2a0/0x59c)
> > > [    4.515432] [<c036a150>] (worker_thread) from [<c0372208>]
> > > (kthread+0x16c/0x178)
> > > [    4.522843] [<c0372208>] (kthread) from [<c0300174>]
> > > (ret_from_fork+0x14/0x20)
> > > [    4.530074] Exception stack(0xca075fb0 to 0xca075ff8)
> > > [    4.535138] 5fa0:                                     00000000
> > > 00000000 00000000 00000000
> > > [    4.543328] 5fc0: 00000000 00000000 00000000 00000000 00000000
> > > 00000000 00000000 00000000
> > > [    4.551516] 5fe0: 00000000 00000000 00000000 00000000 00000013
> > > 00000000
> > > 
> > > 
> > > 
> > 
> > Thanks!
> > 
> > That confirms your suspicions.
> > 
> > So the reason is that ->get_temp() is called while the mutex is
> > held and
> > thermal_zone_device_is_enabled() wants to take the same mutex.
> 
> Yes, that's correct.
> 
> > Is adding a comment to thermal_zone_device_is_enabled() to never
> > call
> > it while the mutex is held and adding another version of it which
> > does
> > not take the mutex ok?
> 
> The thermal_zone_device_is_enabled() is only used in two places, acpi
> and this imx driver, and given:
> 
> 1. as soon as the mutex is released, there is no guarantee the
> thermal
> zone won't be changed right after, the lock is pointless, thus the
> information also.
> 
> 2. from a design point of view, I don't see why a driver should know
> if
> a thermal zone is disabled or not
> 
> It would make sense to end with this function and do not give the
> different drivers an opportunity to access this information.

I agree.
> 
> Why not add change_mode for the acpi in order to enable or disable
> the
> events

thermal_zone_device_is_enabled() is invoked in acpi thermal driver
because we only want to do thermal_zone_device_update() when the acpi
thermal zone is enabled.

As thermal_zone_device_update() can handle a disabled thermal zone now,
we can just remove the check.

thanks,
rui

>  and for imx_thermal use irq_enabled flag instead of the thermal
> zone mode? Moreover it is very unclear why this function is needed in
> imx_get_temp(), and I suspect we should be able to get rid of it.
> 
> 


  parent reply index

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-29 12:29 Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 01/11] acpi: thermal: Fix error handling in the register function Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 02/11] thermal: Store thermal mode in a dedicated enum Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 03/11] thermal: Add current mode to thermal zone device Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 04/11] thermal: Store device mode in struct thermal_zone_device Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 05/11] thermal: remove get_mode() operation of drivers Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 06/11] thermal: Add mode helpers Andrzej Pietrasiewicz
2020-06-29 12:29 ` [PATCH v7 07/11] thermal: Use mode helpers in drivers Andrzej Pietrasiewicz
2020-06-29 12:45   ` Amit Kucheria
2020-06-29 14:14   ` Bartlomiej Zolnierkiewicz
2020-06-29 12:29 ` [PATCH v7 08/11] thermal: Explicitly enable non-changing thermal zone devices Andrzej Pietrasiewicz
2020-06-29 14:15   ` Bartlomiej Zolnierkiewicz
2020-06-30  5:07   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 09/11] thermal: core: Stop polling DISABLED thermal devices Andrzej Pietrasiewicz
2020-06-29 12:46   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 10/11] thermal: Simplify or eliminate unnecessary set_mode() methods Andrzej Pietrasiewicz
2020-06-29 12:47   ` Amit Kucheria
2020-06-29 12:29 ` [PATCH v7 11/11] thermal: Rename set_mode() to change_mode() Andrzej Pietrasiewicz
2020-06-29 12:48   ` Amit Kucheria
2020-06-30 12:57 ` [PATCH v7 00/11] Stop monitoring disabled devices Daniel Lezcano
2020-06-30 13:43   ` Andrzej Pietrasiewicz
2020-06-30 14:53     ` Daniel Lezcano
2020-06-30 15:29       ` Andrzej Pietrasiewicz
2020-06-30 15:53         ` Daniel Lezcano
2020-06-30 16:56           ` Andrzej Pietrasiewicz
2020-06-30 18:33             ` Daniel Lezcano
2020-07-01 10:23               ` Andrzej Pietrasiewicz
2020-07-01 14:25                 ` Daniel Lezcano
2020-07-02 13:47                 ` Daniel Lezcano
2020-07-02 13:53                   ` Andrzej Pietrasiewicz
2020-07-02 14:58                     ` Daniel Lezcano
2020-07-02 17:01                     ` Daniel Lezcano
2020-07-02 17:19                       ` Andrzej Pietrasiewicz
2020-07-02 17:49                         ` Daniel Lezcano
2020-07-02 17:52                           ` Daniel Lezcano
2020-07-03  1:49                           ` Zhang Rui [this message]
2020-07-03  6:38                             ` Daniel Lezcano
2020-07-03 10:45                               ` Andrzej Pietrasiewicz
2020-07-03 11:05                                 ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44c622dd7de8c7bf143c4435c0edd1b98d09a3d6.camel@intel.com \
    --to=rui.zhang@intel.com \
    --cc=allison@lohutok.net \
    --cc=amit.kucheria@verdurent.com \
    --cc=andrzej.p@collabora.com \
    --cc=andy@infradead.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=baolin.wang7@gmail.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=davem@davemloft.net \
    --cc=dvhart@infradead.org \
    --cc=emmanuel.grumbach@intel.com \
    --cc=festevam@gmail.com \
    --cc=gayatri.kammela@intel.com \
    --cc=heiko@sntech.de \
    --cc=idosch@mellanox.com \
    --cc=info@metux.net \
    --cc=jiri@mellanox.com \
    --cc=johannes.berg@intel.com \
    --cc=kernel@collabora.com \
    --cc=kernel@pengutronix.de \
    --cc=kvalo@codeaurora.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linuxwifi@intel.com \
    --cc=luciano.coelho@intel.com \
    --cc=miquel.raynal@bootlin.com \
    --cc=netdev@vger.kernel.org \
    --cc=niklas.soderlund@ragnatech.se \
    --cc=orsonzhai@gmail.com \
    --cc=peter@piie.net \
    --cc=platform-driver-x86@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=s.hauer@pengutronix.de \
    --cc=shawnguo@kernel.org \
    --cc=sre@kernel.org \
    --cc=support.opensource@diasemi.com \
    --cc=tglx@linutronix.de \
    --cc=vishal@chelsio.com \
    --cc=zhang.lyra@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Renesas-SoC Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-renesas-soc/0 linux-renesas-soc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-renesas-soc linux-renesas-soc/ https://lore.kernel.org/linux-renesas-soc \
		linux-renesas-soc@vger.kernel.org
	public-inbox-index linux-renesas-soc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-renesas-soc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git