From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B263C433EF for ; Thu, 12 May 2022 10:08:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1352062AbiELKIR (ORCPT ); Thu, 12 May 2022 06:08:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345766AbiELKIQ (ORCPT ); Thu, 12 May 2022 06:08:16 -0400 Received: from metis.ext.pengutronix.de (metis.ext.pengutronix.de [IPv6:2001:67c:670:201:290:27ff:fe1d:cc33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C1C553B41 for ; Thu, 12 May 2022 03:08:14 -0700 (PDT) Received: from gallifrey.ext.pengutronix.de ([2001:67c:670:201:5054:ff:fe8d:eefb] helo=[IPv6:::1]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1np5jv-0002y7-Ao; Thu, 12 May 2022 12:08:11 +0200 Message-ID: <6918b1a7ba401cd4db2db0601137766acd93bc63.camel@pengutronix.de> Subject: Re: [PATCH v1] thermal: imx: Update critical temp threshold From: Lucas Stach To: Francesco Dolcini , Daniel Lezcano , Sascha Hauer , Shawn Guo Cc: linux-pm@vger.kernel.org, Tim Harvey , Amit Kucheria , Jon Nettleton , NXP Linux Team , Pengutronix Kernel Team , "Rafael J . Wysocki" , Fabio Estevam , linux-arm-kernel@lists.infradead.org Date: Thu, 12 May 2022 12:08:08 +0200 In-Reply-To: <20220512073600.GA36153@francesco-nb.int.toradex.com> References: <20220420091300.179753-1-francesco.dolcini@toradex.com> <486c5c72-812a-d4ea-0c5a-49783bdc4a1f@linaro.org> <20220512073600.GA36153@francesco-nb.int.toradex.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 2001:67c:670:201:5054:ff:fe8d:eefb X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-pm@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Am Donnerstag, dem 12.05.2022 um 09:36 +0200 schrieb Francesco Dolcini: > Hello Daniel, Sasha, Shawn and all > > On Mon, May 09, 2022 at 11:55:20AM +0200, Daniel Lezcano wrote: > > On 20/04/2022 11:13, Francesco Dolcini wrote: > > > Increase the critical temperature threshold to the datasheet defined > > > value according to the temperature grade of the SoC, increasing the > > > actual critical temperature value of 5 degrees. > > > > > > Without this change the emergency shutdown will trigger earlier then > > > required affecting applications that are expected to be working on this > > > close to the limit, but yet valid, temperature range. > > > > > > Signed-off-by: Francesco Dolcini > > > --- > > > > > > Not sure if there is an alternative to this patch, the critical threshold seems > > > to be read-only and it is not possible to just change it from user space that > > > would be my preferred solution. > > > > > > According to the original discussion [1] the reasoning was the following: > > > > > > On Tue, Jul 28, 2015 at 4:50 PM, Tim Harvey wrote: > > > > Yes - the purpose of lowering the critical threshold from the hardware > > > > default is to allow Linux to shutdown more cleanly. > > > > > > But I do not understand it. > > > > Shawn, Sascha ? any comment ? > > Just one small addition, we (Toradex) are using this modified critical > threshold since quite some time, on multiple i.MX[67]* SOC, and we > regularly run stress tests on commercial/IT part on the whole > temperature working range (ambient temperature up to 85 degrees for IT > modules) in climate chambers and I'm not aware of any issue reported > because of that (indeed, it is the other way around, without this change > we had issues). That is really an overall system design issue. Most chips will probably work fine when going over the critical temperature, as this is mostly set due to device lifetime constraints, not because the chip fails at this temperature. However the chip is only guaranteed to work at up to the critical temperature, so one could argue that starting a orderly shutdown when the critical temperature is reached is already too late, as the temperature may rise further during the time taken to shut down the system. Also device leakage increases a lot at those critical temperatures, so the system may fail not because the chip is malfunctioning, but the board power supply may not be able to supply the increased current required. Really I think there is no right or wrong here. I believe that this needs to be up to the system integrator, so the critical temperature should be writable by userspace in the constraints set by the fuses. Regards, Lucas From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B081BC433F5 for ; Thu, 12 May 2022 10:09:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Date:Cc:To:From:Subject:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=YObC3I3/TJRvthMY8g+FkK7bhsXjGFxR0AcvYUjrVYI=; b=L/cvboq/pjb58i sehGTDV3rXjNvEy73sMQm/OnR5w64xbcyp28eQcVth94YuSaPaaJO5/A872E04sSj4ZwijeS+JjbF vVRF0QhlaPvzHG2KDI9fy05ZaYRavmpJVjPcs79lPgPmxhEnObsDrvweIArqeDDWbK9SNtLP4lRaQ 3h2SzCY1kPLuCem4Al9swTUJ244DALTxIPDuzrDx34+Dc68YWGOtMFN63zfkB1uCJl/Rh9VE7eXwB L84r9VJstiydrkij+OH3t1c8e/A/avDBf84fwqlEF/2mTUAYxiKjC5avlbTJFosRka4qsAN+bOomx +yd4R+yDaRk77wQ+LSMg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1np5kP-00BIbj-D5; Thu, 12 May 2022 10:08:41 +0000 Received: from metis.ext.pengutronix.de ([2001:67c:670:201:290:27ff:fe1d:cc33]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1np5kI-00BIai-UR for linux-arm-kernel@lists.infradead.org; Thu, 12 May 2022 10:08:39 +0000 Received: from gallifrey.ext.pengutronix.de ([2001:67c:670:201:5054:ff:fe8d:eefb] helo=[IPv6:::1]) by metis.ext.pengutronix.de with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1np5jv-0002y7-Ao; Thu, 12 May 2022 12:08:11 +0200 Message-ID: <6918b1a7ba401cd4db2db0601137766acd93bc63.camel@pengutronix.de> Subject: Re: [PATCH v1] thermal: imx: Update critical temp threshold From: Lucas Stach To: Francesco Dolcini , Daniel Lezcano , Sascha Hauer , Shawn Guo Cc: linux-pm@vger.kernel.org, Tim Harvey , Amit Kucheria , Jon Nettleton , NXP Linux Team , Pengutronix Kernel Team , "Rafael J . Wysocki" , Fabio Estevam , linux-arm-kernel@lists.infradead.org Date: Thu, 12 May 2022 12:08:08 +0200 In-Reply-To: <20220512073600.GA36153@francesco-nb.int.toradex.com> References: <20220420091300.179753-1-francesco.dolcini@toradex.com> <486c5c72-812a-d4ea-0c5a-49783bdc4a1f@linaro.org> <20220512073600.GA36153@francesco-nb.int.toradex.com> User-Agent: Evolution 3.40.4 (3.40.4-1.fc34) MIME-Version: 1.0 X-SA-Exim-Connect-IP: 2001:67c:670:201:5054:ff:fe8d:eefb X-SA-Exim-Mail-From: l.stach@pengutronix.de X-SA-Exim-Scanned: No (on metis.ext.pengutronix.de); SAEximRunCond expanded to false X-PTX-Original-Recipient: linux-arm-kernel@lists.infradead.org X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220512_030835_031650_D655ADBA X-CRM114-Status: GOOD ( 30.72 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Am Donnerstag, dem 12.05.2022 um 09:36 +0200 schrieb Francesco Dolcini: > Hello Daniel, Sasha, Shawn and all > > On Mon, May 09, 2022 at 11:55:20AM +0200, Daniel Lezcano wrote: > > On 20/04/2022 11:13, Francesco Dolcini wrote: > > > Increase the critical temperature threshold to the datasheet defined > > > value according to the temperature grade of the SoC, increasing the > > > actual critical temperature value of 5 degrees. > > > > > > Without this change the emergency shutdown will trigger earlier then > > > required affecting applications that are expected to be working on this > > > close to the limit, but yet valid, temperature range. > > > > > > Signed-off-by: Francesco Dolcini > > > --- > > > > > > Not sure if there is an alternative to this patch, the critical threshold seems > > > to be read-only and it is not possible to just change it from user space that > > > would be my preferred solution. > > > > > > According to the original discussion [1] the reasoning was the following: > > > > > > On Tue, Jul 28, 2015 at 4:50 PM, Tim Harvey wrote: > > > > Yes - the purpose of lowering the critical threshold from the hardware > > > > default is to allow Linux to shutdown more cleanly. > > > > > > But I do not understand it. > > > > Shawn, Sascha ? any comment ? > > Just one small addition, we (Toradex) are using this modified critical > threshold since quite some time, on multiple i.MX[67]* SOC, and we > regularly run stress tests on commercial/IT part on the whole > temperature working range (ambient temperature up to 85 degrees for IT > modules) in climate chambers and I'm not aware of any issue reported > because of that (indeed, it is the other way around, without this change > we had issues). That is really an overall system design issue. Most chips will probably work fine when going over the critical temperature, as this is mostly set due to device lifetime constraints, not because the chip fails at this temperature. However the chip is only guaranteed to work at up to the critical temperature, so one could argue that starting a orderly shutdown when the critical temperature is reached is already too late, as the temperature may rise further during the time taken to shut down the system. Also device leakage increases a lot at those critical temperatures, so the system may fail not because the chip is malfunctioning, but the board power supply may not be able to supply the increased current required. Really I think there is no right or wrong here. I believe that this needs to be up to the system integrator, so the critical temperature should be writable by userspace in the constraints set by the fuses. Regards, Lucas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel