From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D004C4338F for ; Mon, 26 Jul 2021 16:58:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3F1F860E08 for ; Mon, 26 Jul 2021 16:58:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236838AbhGZQRr (ORCPT ); Mon, 26 Jul 2021 12:17:47 -0400 Received: from thoth.sbs.de ([192.35.17.2]:49903 "EHLO thoth.sbs.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241751AbhGZQQd (ORCPT ); Mon, 26 Jul 2021 12:16:33 -0400 Received: from mail2.sbs.de (mail2.sbs.de [192.129.41.66]) by thoth.sbs.de (8.15.2/8.15.2) with ESMTPS id 16QGujpc008294 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Jul 2021 18:56:45 +0200 Received: from [167.87.33.191] ([167.87.33.191]) by mail2.sbs.de (8.15.2/8.15.2) with ESMTP id 16QGujZv001349; Mon, 26 Jul 2021 18:56:45 +0200 Subject: Re: iTCO_wdt regression on Dell laptop To: =?UTF-8?Q?Mantas_Mikul=c4=97nas?= Cc: Guenter Roeck , Wim Van Sebroeck , linux-watchdog@vger.kernel.org References: <4f72e518-ef01-505d-2523-6d3b151e1719@siemens.com> <1d07f96c-a8c9-06e5-69ec-2c099df7b1f3@siemens.com> From: Jan Kiszka Message-ID: <1428ec88-cdec-2368-6932-2803f57ed041@siemens.com> Date: Mon, 26 Jul 2021 18:56:44 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-watchdog@vger.kernel.org On 26.07.21 18:54, Mantas Mikulėnas wrote: > On Mon, Jul 26, 2021 at 12:45 PM Jan Kiszka wrote: >> >> On 26.07.21 11:40, Jan Kiszka wrote: >>> On 26.07.21 11:19, Mantas Mikulėnas wrote: >>>> Hello, >>>> >>>> I have a Dell Inspiron 15-5547 laptop, with systemd configured to set >>>> the watchdog to a 2-minute expiry (due to reasons): >>>> >>>> # /etc/systemd/system.conf >>>> [Manager] >>>> RuntimeWatchdogSec=2min >>>> >>>> So far this setting has worked without problems (including kernels >>>> 5.12.15 and 5.13.1); however, with kernel 5.13.4 the system inevitably >>>> reboots after a few minutes of uptime. >>>> >>>> I have tracked the issue down to commit 5e65819a006e "watchdog: >>>> iTCO_wdt: Account for rebooting on second timeout" in the 5.13.x >>>> branch (commit cb011044e34c upstream). There are no unexpected reboots >>>> when running 5.13.4 with this commit reverted. >>>> >>>> Indeed with the original 5.13.4 kernel, `wdctl` always reports >>>> "Timeleft:" counting down from 60 seconds (sometimes very nearly >>>> reaching 0), even though "Timeout" is still reported to be 120. >>>> >>>> (systemd pokes the watchdog as part of its main loop, trying to so >>>> approximately "between 1/4 and 1/2" of the configured interval. >>>> According to wdctl these pings usually happen every 35-50 seconds but >>>> sometimes nearly at the 60-second mark, and thanks to the kernel now >>>> also dividing the requested expiry by /2 which systemd is unaware of, >>>> sometimes this ends up being a *very* close race to 0.) >>>> >>>> This is a Haswell-era machine (i7-4510U) and seems to have a "version >>>> 0" watchdog: >>>> >>>> Jul 26 11:34:04 archlinux kernel: Linux version 5.13.4-arch2-1 >>>> (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 >>>> SMP PREEMPT Thu, 22 Jul 2021 20:46:28 +0000 >>>> Jul 26 11:34:14 frost kernel: iTCO_vendor_support: vendor-support=0 >>>> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: Found a Lynx >>>> Point_LP TCO device (Version=2, TCOBASE=0x1860) >>>> Jul 26 11:34:14 frost systemd[1]: Using hardware watchdog 'iTCO_wdt', >>>> version 0, device /dev/watchdog >>>> Jul 26 11:34:14 frost systemd[1]: Set hardware watchdog to 2min. >>>> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: initialized. >>>> heartbeat=30 sec (nowayout=0) >>>> >>> >>> Could you printk SMI_EN(p) in iTCO_wdt_set_timeout() >>> (drivers/watchdog/iTCO_wdt.c)? This is where we decide whether SMIs are >>> working, thus the countdown will only run once. Apparently, something is >>> wrong with the detection on this system. >>> >> >> Wait, found it: >> >> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c >> index b3f604669e2c..643c6c2d0b72 100644 >> --- a/drivers/watchdog/iTCO_wdt.c >> +++ b/drivers/watchdog/iTCO_wdt.c >> @@ -362,7 +362,7 @@ static int iTCO_wdt_set_timeout(struct watchdog_device *wd_dev, unsigned int t) >> * Otherwise, the BIOS generally reboots when the SMI triggers. >> */ >> if (p->smi_res && >> - (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN)) >> + (inl(SMI_EN(p)) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN)) >> tmrval /= 2; >> >> /* from the specs: */ > > Rebuilt with this and it fixes the issue, thanks. > Thanks for confirming! Please also reply with a "Tested-by: ..." on the patch I sent earlier. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux