linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* iTCO_wdt regression on Dell laptop
@ 2021-07-26  9:19 Mantas Mikulėnas
  2021-07-26  9:40 ` Jan Kiszka
  0 siblings, 1 reply; 5+ messages in thread
From: Mantas Mikulėnas @ 2021-07-26  9:19 UTC (permalink / raw)
  To: Guenter Roeck, Jan Kiszka; +Cc: Wim Van Sebroeck, linux-watchdog

Hello,

I have a Dell Inspiron 15-5547 laptop, with systemd configured to set
the watchdog to a 2-minute expiry (due to reasons):

# /etc/systemd/system.conf
[Manager]
RuntimeWatchdogSec=2min

So far this setting has worked without problems (including kernels
5.12.15 and 5.13.1); however, with kernel 5.13.4 the system inevitably
reboots after a few minutes of uptime.

I have tracked the issue down to commit 5e65819a006e "watchdog:
iTCO_wdt: Account for rebooting on second timeout" in the 5.13.x
branch (commit cb011044e34c upstream). There are no unexpected reboots
when running 5.13.4 with this commit reverted.

Indeed with the original 5.13.4 kernel, `wdctl` always reports
"Timeleft:" counting down from 60 seconds (sometimes very nearly
reaching 0), even though "Timeout" is still reported to be 120.

(systemd pokes the watchdog as part of its main loop, trying to so
approximately "between 1/4 and 1/2" of the configured interval.
According to wdctl these pings usually happen every 35-50 seconds but
sometimes nearly at the 60-second mark, and thanks to the kernel now
also dividing the requested expiry by /2 which systemd is unaware of,
sometimes this ends up being a *very* close race to 0.)

This is a Haswell-era machine (i7-4510U) and seems to have a "version
0" watchdog:

Jul 26 11:34:04 archlinux kernel: Linux version 5.13.4-arch2-1
(linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1
SMP PREEMPT Thu, 22 Jul 2021 20:46:28 +0000
Jul 26 11:34:14 frost kernel: iTCO_vendor_support: vendor-support=0
Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: Found a Lynx
Point_LP TCO device (Version=2, TCOBASE=0x1860)
Jul 26 11:34:14 frost systemd[1]: Using hardware watchdog 'iTCO_wdt',
version 0, device /dev/watchdog
Jul 26 11:34:14 frost systemd[1]: Set hardware watchdog to 2min.
Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: initialized.
heartbeat=30 sec (nowayout=0)

-- 
Mantas Mikulėnas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: iTCO_wdt regression on Dell laptop
  2021-07-26  9:19 iTCO_wdt regression on Dell laptop Mantas Mikulėnas
@ 2021-07-26  9:40 ` Jan Kiszka
  2021-07-26  9:45   ` Jan Kiszka
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2021-07-26  9:40 UTC (permalink / raw)
  To: Mantas Mikulėnas, Guenter Roeck; +Cc: Wim Van Sebroeck, linux-watchdog

On 26.07.21 11:19, Mantas Mikulėnas wrote:
> Hello,
> 
> I have a Dell Inspiron 15-5547 laptop, with systemd configured to set
> the watchdog to a 2-minute expiry (due to reasons):
> 
> # /etc/systemd/system.conf
> [Manager]
> RuntimeWatchdogSec=2min
> 
> So far this setting has worked without problems (including kernels
> 5.12.15 and 5.13.1); however, with kernel 5.13.4 the system inevitably
> reboots after a few minutes of uptime.
> 
> I have tracked the issue down to commit 5e65819a006e "watchdog:
> iTCO_wdt: Account for rebooting on second timeout" in the 5.13.x
> branch (commit cb011044e34c upstream). There are no unexpected reboots
> when running 5.13.4 with this commit reverted.
> 
> Indeed with the original 5.13.4 kernel, `wdctl` always reports
> "Timeleft:" counting down from 60 seconds (sometimes very nearly
> reaching 0), even though "Timeout" is still reported to be 120.
> 
> (systemd pokes the watchdog as part of its main loop, trying to so
> approximately "between 1/4 and 1/2" of the configured interval.
> According to wdctl these pings usually happen every 35-50 seconds but
> sometimes nearly at the 60-second mark, and thanks to the kernel now
> also dividing the requested expiry by /2 which systemd is unaware of,
> sometimes this ends up being a *very* close race to 0.)
> 
> This is a Haswell-era machine (i7-4510U) and seems to have a "version
> 0" watchdog:
> 
> Jul 26 11:34:04 archlinux kernel: Linux version 5.13.4-arch2-1
> (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1
> SMP PREEMPT Thu, 22 Jul 2021 20:46:28 +0000
> Jul 26 11:34:14 frost kernel: iTCO_vendor_support: vendor-support=0
> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: Found a Lynx
> Point_LP TCO device (Version=2, TCOBASE=0x1860)
> Jul 26 11:34:14 frost systemd[1]: Using hardware watchdog 'iTCO_wdt',
> version 0, device /dev/watchdog
> Jul 26 11:34:14 frost systemd[1]: Set hardware watchdog to 2min.
> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: initialized.
> heartbeat=30 sec (nowayout=0)
> 

Could you printk SMI_EN(p) in iTCO_wdt_set_timeout()
(drivers/watchdog/iTCO_wdt.c)? This is where we decide whether SMIs are
working, thus the countdown will only run once. Apparently, something is
wrong with the detection on this system.

Thanks,
Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: iTCO_wdt regression on Dell laptop
  2021-07-26  9:40 ` Jan Kiszka
@ 2021-07-26  9:45   ` Jan Kiszka
  2021-07-26 16:54     ` Mantas Mikulėnas
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2021-07-26  9:45 UTC (permalink / raw)
  To: Mantas Mikulėnas, Guenter Roeck; +Cc: Wim Van Sebroeck, linux-watchdog

On 26.07.21 11:40, Jan Kiszka wrote:
> On 26.07.21 11:19, Mantas Mikulėnas wrote:
>> Hello,
>>
>> I have a Dell Inspiron 15-5547 laptop, with systemd configured to set
>> the watchdog to a 2-minute expiry (due to reasons):
>>
>> # /etc/systemd/system.conf
>> [Manager]
>> RuntimeWatchdogSec=2min
>>
>> So far this setting has worked without problems (including kernels
>> 5.12.15 and 5.13.1); however, with kernel 5.13.4 the system inevitably
>> reboots after a few minutes of uptime.
>>
>> I have tracked the issue down to commit 5e65819a006e "watchdog:
>> iTCO_wdt: Account for rebooting on second timeout" in the 5.13.x
>> branch (commit cb011044e34c upstream). There are no unexpected reboots
>> when running 5.13.4 with this commit reverted.
>>
>> Indeed with the original 5.13.4 kernel, `wdctl` always reports
>> "Timeleft:" counting down from 60 seconds (sometimes very nearly
>> reaching 0), even though "Timeout" is still reported to be 120.
>>
>> (systemd pokes the watchdog as part of its main loop, trying to so
>> approximately "between 1/4 and 1/2" of the configured interval.
>> According to wdctl these pings usually happen every 35-50 seconds but
>> sometimes nearly at the 60-second mark, and thanks to the kernel now
>> also dividing the requested expiry by /2 which systemd is unaware of,
>> sometimes this ends up being a *very* close race to 0.)
>>
>> This is a Haswell-era machine (i7-4510U) and seems to have a "version
>> 0" watchdog:
>>
>> Jul 26 11:34:04 archlinux kernel: Linux version 5.13.4-arch2-1
>> (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1
>> SMP PREEMPT Thu, 22 Jul 2021 20:46:28 +0000
>> Jul 26 11:34:14 frost kernel: iTCO_vendor_support: vendor-support=0
>> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: Found a Lynx
>> Point_LP TCO device (Version=2, TCOBASE=0x1860)
>> Jul 26 11:34:14 frost systemd[1]: Using hardware watchdog 'iTCO_wdt',
>> version 0, device /dev/watchdog
>> Jul 26 11:34:14 frost systemd[1]: Set hardware watchdog to 2min.
>> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: initialized.
>> heartbeat=30 sec (nowayout=0)
>>
> 
> Could you printk SMI_EN(p) in iTCO_wdt_set_timeout()
> (drivers/watchdog/iTCO_wdt.c)? This is where we decide whether SMIs are
> working, thus the countdown will only run once. Apparently, something is
> wrong with the detection on this system.
> 

Wait, found it:

diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
index b3f604669e2c..643c6c2d0b72 100644
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -362,7 +362,7 @@ static int iTCO_wdt_set_timeout(struct watchdog_device *wd_dev, unsigned int t)
 	 * Otherwise, the BIOS generally reboots when the SMI triggers.
 	 */
 	if (p->smi_res &&
-	    (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
+	    (inl(SMI_EN(p)) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
 		tmrval /= 2;
 
 	/* from the specs: */

Hand me a brown paper bag...

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: iTCO_wdt regression on Dell laptop
  2021-07-26  9:45   ` Jan Kiszka
@ 2021-07-26 16:54     ` Mantas Mikulėnas
  2021-07-26 16:56       ` Jan Kiszka
  0 siblings, 1 reply; 5+ messages in thread
From: Mantas Mikulėnas @ 2021-07-26 16:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Guenter Roeck, Wim Van Sebroeck, linux-watchdog

On Mon, Jul 26, 2021 at 12:45 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
> On 26.07.21 11:40, Jan Kiszka wrote:
> > On 26.07.21 11:19, Mantas Mikulėnas wrote:
> >> Hello,
> >>
> >> I have a Dell Inspiron 15-5547 laptop, with systemd configured to set
> >> the watchdog to a 2-minute expiry (due to reasons):
> >>
> >> # /etc/systemd/system.conf
> >> [Manager]
> >> RuntimeWatchdogSec=2min
> >>
> >> So far this setting has worked without problems (including kernels
> >> 5.12.15 and 5.13.1); however, with kernel 5.13.4 the system inevitably
> >> reboots after a few minutes of uptime.
> >>
> >> I have tracked the issue down to commit 5e65819a006e "watchdog:
> >> iTCO_wdt: Account for rebooting on second timeout" in the 5.13.x
> >> branch (commit cb011044e34c upstream). There are no unexpected reboots
> >> when running 5.13.4 with this commit reverted.
> >>
> >> Indeed with the original 5.13.4 kernel, `wdctl` always reports
> >> "Timeleft:" counting down from 60 seconds (sometimes very nearly
> >> reaching 0), even though "Timeout" is still reported to be 120.
> >>
> >> (systemd pokes the watchdog as part of its main loop, trying to so
> >> approximately "between 1/4 and 1/2" of the configured interval.
> >> According to wdctl these pings usually happen every 35-50 seconds but
> >> sometimes nearly at the 60-second mark, and thanks to the kernel now
> >> also dividing the requested expiry by /2 which systemd is unaware of,
> >> sometimes this ends up being a *very* close race to 0.)
> >>
> >> This is a Haswell-era machine (i7-4510U) and seems to have a "version
> >> 0" watchdog:
> >>
> >> Jul 26 11:34:04 archlinux kernel: Linux version 5.13.4-arch2-1
> >> (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1
> >> SMP PREEMPT Thu, 22 Jul 2021 20:46:28 +0000
> >> Jul 26 11:34:14 frost kernel: iTCO_vendor_support: vendor-support=0
> >> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: Found a Lynx
> >> Point_LP TCO device (Version=2, TCOBASE=0x1860)
> >> Jul 26 11:34:14 frost systemd[1]: Using hardware watchdog 'iTCO_wdt',
> >> version 0, device /dev/watchdog
> >> Jul 26 11:34:14 frost systemd[1]: Set hardware watchdog to 2min.
> >> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: initialized.
> >> heartbeat=30 sec (nowayout=0)
> >>
> >
> > Could you printk SMI_EN(p) in iTCO_wdt_set_timeout()
> > (drivers/watchdog/iTCO_wdt.c)? This is where we decide whether SMIs are
> > working, thus the countdown will only run once. Apparently, something is
> > wrong with the detection on this system.
> >
>
> Wait, found it:
>
> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
> index b3f604669e2c..643c6c2d0b72 100644
> --- a/drivers/watchdog/iTCO_wdt.c
> +++ b/drivers/watchdog/iTCO_wdt.c
> @@ -362,7 +362,7 @@ static int iTCO_wdt_set_timeout(struct watchdog_device *wd_dev, unsigned int t)
>          * Otherwise, the BIOS generally reboots when the SMI triggers.
>          */
>         if (p->smi_res &&
> -           (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
> +           (inl(SMI_EN(p)) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
>                 tmrval /= 2;
>
>         /* from the specs: */

Rebuilt with this and it fixes the issue, thanks.

-- 
Mantas Mikulėnas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: iTCO_wdt regression on Dell laptop
  2021-07-26 16:54     ` Mantas Mikulėnas
@ 2021-07-26 16:56       ` Jan Kiszka
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kiszka @ 2021-07-26 16:56 UTC (permalink / raw)
  To: Mantas Mikulėnas; +Cc: Guenter Roeck, Wim Van Sebroeck, linux-watchdog

On 26.07.21 18:54, Mantas Mikulėnas wrote:
> On Mon, Jul 26, 2021 at 12:45 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>
>> On 26.07.21 11:40, Jan Kiszka wrote:
>>> On 26.07.21 11:19, Mantas Mikulėnas wrote:
>>>> Hello,
>>>>
>>>> I have a Dell Inspiron 15-5547 laptop, with systemd configured to set
>>>> the watchdog to a 2-minute expiry (due to reasons):
>>>>
>>>> # /etc/systemd/system.conf
>>>> [Manager]
>>>> RuntimeWatchdogSec=2min
>>>>
>>>> So far this setting has worked without problems (including kernels
>>>> 5.12.15 and 5.13.1); however, with kernel 5.13.4 the system inevitably
>>>> reboots after a few minutes of uptime.
>>>>
>>>> I have tracked the issue down to commit 5e65819a006e "watchdog:
>>>> iTCO_wdt: Account for rebooting on second timeout" in the 5.13.x
>>>> branch (commit cb011044e34c upstream). There are no unexpected reboots
>>>> when running 5.13.4 with this commit reverted.
>>>>
>>>> Indeed with the original 5.13.4 kernel, `wdctl` always reports
>>>> "Timeleft:" counting down from 60 seconds (sometimes very nearly
>>>> reaching 0), even though "Timeout" is still reported to be 120.
>>>>
>>>> (systemd pokes the watchdog as part of its main loop, trying to so
>>>> approximately "between 1/4 and 1/2" of the configured interval.
>>>> According to wdctl these pings usually happen every 35-50 seconds but
>>>> sometimes nearly at the 60-second mark, and thanks to the kernel now
>>>> also dividing the requested expiry by /2 which systemd is unaware of,
>>>> sometimes this ends up being a *very* close race to 0.)
>>>>
>>>> This is a Haswell-era machine (i7-4510U) and seems to have a "version
>>>> 0" watchdog:
>>>>
>>>> Jul 26 11:34:04 archlinux kernel: Linux version 5.13.4-arch2-1
>>>> (linux@archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1
>>>> SMP PREEMPT Thu, 22 Jul 2021 20:46:28 +0000
>>>> Jul 26 11:34:14 frost kernel: iTCO_vendor_support: vendor-support=0
>>>> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: Found a Lynx
>>>> Point_LP TCO device (Version=2, TCOBASE=0x1860)
>>>> Jul 26 11:34:14 frost systemd[1]: Using hardware watchdog 'iTCO_wdt',
>>>> version 0, device /dev/watchdog
>>>> Jul 26 11:34:14 frost systemd[1]: Set hardware watchdog to 2min.
>>>> Jul 26 11:34:14 frost kernel: iTCO_wdt iTCO_wdt.3.auto: initialized.
>>>> heartbeat=30 sec (nowayout=0)
>>>>
>>>
>>> Could you printk SMI_EN(p) in iTCO_wdt_set_timeout()
>>> (drivers/watchdog/iTCO_wdt.c)? This is where we decide whether SMIs are
>>> working, thus the countdown will only run once. Apparently, something is
>>> wrong with the detection on this system.
>>>
>>
>> Wait, found it:
>>
>> diff --git a/drivers/watchdog/iTCO_wdt.c b/drivers/watchdog/iTCO_wdt.c
>> index b3f604669e2c..643c6c2d0b72 100644
>> --- a/drivers/watchdog/iTCO_wdt.c
>> +++ b/drivers/watchdog/iTCO_wdt.c
>> @@ -362,7 +362,7 @@ static int iTCO_wdt_set_timeout(struct watchdog_device *wd_dev, unsigned int t)
>>          * Otherwise, the BIOS generally reboots when the SMI triggers.
>>          */
>>         if (p->smi_res &&
>> -           (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
>> +           (inl(SMI_EN(p)) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
>>                 tmrval /= 2;
>>
>>         /* from the specs: */
> 
> Rebuilt with this and it fixes the issue, thanks.
> 

Thanks for confirming! Please also reply with a "Tested-by: ..." on the
patch I sent earlier.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-26 17:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26  9:19 iTCO_wdt regression on Dell laptop Mantas Mikulėnas
2021-07-26  9:40 ` Jan Kiszka
2021-07-26  9:45   ` Jan Kiszka
2021-07-26 16:54     ` Mantas Mikulėnas
2021-07-26 16:56       ` Jan Kiszka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).