linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"
@ 2021-08-03 14:51 Jean Delvare
  2021-08-03 14:59 ` Jan Kiszka
  2021-08-06  6:28 ` Greg KH
  0 siblings, 2 replies; 8+ messages in thread
From: Jean Delvare @ 2021-08-03 14:51 UTC (permalink / raw)
  To: linux-watchdog, LKML, stable
  Cc: Jan Kiszka, Guenter Roeck, Wim Van Sebroeck, Michael Marley

Hi all,

Commit cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on
second timeout") causes a regression on several systems. Symptoms are:
system reboots automatically after a short period of time if watchdog
is enabled (by systemd for example). This has been reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=213809

Unfortunately this commit was backported to all stable kernel branches
(4.14, 4.19, 5.4, 5.10, 5.12 and 5.13). I'm not sure why that is the
case, BTW, as there is no Fixes tag and no Cc to stable@vger either.
And the fix is not trivial, has apparently not seen enough testing,
and addresses a problem that has a known and simple workaround. IMHO it
should never have been accepted as a stable patch in the first place.
Especially when the previous attempt to fix this issue already ended
with a regression and a revert.

Anyway... After a glance at the patch, I see what looks like a nice
thinko:

+	if (p->smi_res &&
+	    (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))

The author most certainly meant inl(SMI_EN(p)) (the register's value)
and not SMI_EN(p) (the register's address).

-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"
  2021-08-03 14:51 Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout" Jean Delvare
@ 2021-08-03 14:59 ` Jan Kiszka
  2021-08-03 15:01   ` Jan Kiszka
                     ` (2 more replies)
  2021-08-06  6:28 ` Greg KH
  1 sibling, 3 replies; 8+ messages in thread
From: Jan Kiszka @ 2021-08-03 14:59 UTC (permalink / raw)
  To: Jean Delvare, linux-watchdog, LKML, stable
  Cc: Guenter Roeck, Wim Van Sebroeck, Michael Marley

On 03.08.21 16:51, Jean Delvare wrote:
> Hi all,
> 
> Commit cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on
> second timeout") causes a regression on several systems. Symptoms are:
> system reboots automatically after a short period of time if watchdog
> is enabled (by systemd for example). This has been reported in bugzilla:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=213809
> 
> Unfortunately this commit was backported to all stable kernel branches
> (4.14, 4.19, 5.4, 5.10, 5.12 and 5.13). I'm not sure why that is the
> case, BTW, as there is no Fixes tag and no Cc to stable@vger either.
> And the fix is not trivial, has apparently not seen enough testing,
> and addresses a problem that has a known and simple workaround. IMHO it
> should never have been accepted as a stable patch in the first place.
> Especially when the previous attempt to fix this issue already ended
> with a regression and a revert.
> 
> Anyway... After a glance at the patch, I see what looks like a nice
> thinko:
> 
> +	if (p->smi_res &&
> +	    (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
> 
> The author most certainly meant inl(SMI_EN(p)) (the register's value)
> and not SMI_EN(p) (the register's address).
> 

https://lkml.org/lkml/2021/7/26/349

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"
  2021-08-03 14:59 ` Jan Kiszka
@ 2021-08-03 15:01   ` Jan Kiszka
  2021-08-03 15:27     ` Guenter Roeck
  2021-08-03 15:32   ` Jean Delvare
  2021-08-04  0:04   ` Michael Marley
  2 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2021-08-03 15:01 UTC (permalink / raw)
  To: Jean Delvare, linux-watchdog, LKML, stable
  Cc: Guenter Roeck, Wim Van Sebroeck, Michael Marley

On 03.08.21 16:59, Jan Kiszka wrote:
> On 03.08.21 16:51, Jean Delvare wrote:
>> Hi all,
>>
>> Commit cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on
>> second timeout") causes a regression on several systems. Symptoms are:
>> system reboots automatically after a short period of time if watchdog
>> is enabled (by systemd for example). This has been reported in bugzilla:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=213809
>>
>> Unfortunately this commit was backported to all stable kernel branches
>> (4.14, 4.19, 5.4, 5.10, 5.12 and 5.13). I'm not sure why that is the
>> case, BTW, as there is no Fixes tag and no Cc to stable@vger either.
>> And the fix is not trivial, has apparently not seen enough testing,
>> and addresses a problem that has a known and simple workaround. IMHO it
>> should never have been accepted as a stable patch in the first place.
>> Especially when the previous attempt to fix this issue already ended
>> with a regression and a revert.
>>
>> Anyway... After a glance at the patch, I see what looks like a nice
>> thinko:
>>
>> +	if (p->smi_res &&
>> +	    (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
>>
>> The author most certainly meant inl(SMI_EN(p)) (the register's value)
>> and not SMI_EN(p) (the register's address).
>>
> 
> https://lkml.org/lkml/2021/7/26/349
> 

That's for the fix (in line with your analysis).

I was also wondering if backporting that quickly was needed. Didn't
propose it, though.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"
  2021-08-03 15:01   ` Jan Kiszka
@ 2021-08-03 15:27     ` Guenter Roeck
  0 siblings, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2021-08-03 15:27 UTC (permalink / raw)
  To: Jan Kiszka, Jean Delvare, linux-watchdog, LKML, stable
  Cc: Wim Van Sebroeck, Michael Marley

On 8/3/21 8:01 AM, Jan Kiszka wrote:
> On 03.08.21 16:59, Jan Kiszka wrote:
>> On 03.08.21 16:51, Jean Delvare wrote:
>>> Hi all,
>>>
>>> Commit cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on
>>> second timeout") causes a regression on several systems. Symptoms are:
>>> system reboots automatically after a short period of time if watchdog
>>> is enabled (by systemd for example). This has been reported in bugzilla:
>>>
>>> https://bugzilla.kernel.org/show_bug.cgi?id=213809
>>>
>>> Unfortunately this commit was backported to all stable kernel branches
>>> (4.14, 4.19, 5.4, 5.10, 5.12 and 5.13). I'm not sure why that is the
>>> case, BTW, as there is no Fixes tag and no Cc to stable@vger either.
>>> And the fix is not trivial, has apparently not seen enough testing,
>>> and addresses a problem that has a known and simple workaround. IMHO it
>>> should never have been accepted as a stable patch in the first place.
>>> Especially when the previous attempt to fix this issue already ended
>>> with a regression and a revert.
>>>
>>> Anyway... After a glance at the patch, I see what looks like a nice
>>> thinko:
>>>
>>> +	if (p->smi_res &&
>>> +	    (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
>>>
>>> The author most certainly meant inl(SMI_EN(p)) (the register's value)
>>> and not SMI_EN(p) (the register's address).
>>>

Yes, shame on me that I didn't see that.

>>
>> https://lkml.org/lkml/2021/7/26/349
>>
> 
> That's for the fix (in line with your analysis).
> 
> I was also wondering if backporting that quickly was needed. Didn't
> propose it, though.
> 

I'd suggest to discuss that with Greg and Sasha. Backporting is pretty
aggressive nowadays.

Guenter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"
  2021-08-03 14:59 ` Jan Kiszka
  2021-08-03 15:01   ` Jan Kiszka
@ 2021-08-03 15:32   ` Jean Delvare
  2021-08-04  0:04   ` Michael Marley
  2 siblings, 0 replies; 8+ messages in thread
From: Jean Delvare @ 2021-08-03 15:32 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: linux-watchdog, LKML, stable, Guenter Roeck, Wim Van Sebroeck,
	Michael Marley

On Tue, 3 Aug 2021 16:59:02 +0200, Jan Kiszka wrote:
> https://lkml.org/lkml/2021/7/26/349

For the record, I tested this fix successfully on my system (as in: no
more random reboots).

Tested-by: Jean Delvare <jdelvare@suse.de>

Thanks,
-- 
Jean Delvare
SUSE L3 Support

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"
  2021-08-03 14:59 ` Jan Kiszka
  2021-08-03 15:01   ` Jan Kiszka
  2021-08-03 15:32   ` Jean Delvare
@ 2021-08-04  0:04   ` Michael Marley
  2021-08-04  6:58     ` Jan Kiszka
  2 siblings, 1 reply; 8+ messages in thread
From: Michael Marley @ 2021-08-04  0:04 UTC (permalink / raw)
  To: Jan Kiszka, Jean Delvare, linux-watchdog, LKML, stable
  Cc: Guenter Roeck, Wim Van Sebroeck

On 8/3/21 10:59 AM, Jan Kiszka wrote:

> https://lkml.org/lkml/2021/7/26/349

It fixes the problem for me (the person who opened the Bugzilla report) 
too, thanks!

Tested-by: Michael Marley <michael@michaelmarley.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"
  2021-08-04  0:04   ` Michael Marley
@ 2021-08-04  6:58     ` Jan Kiszka
  0 siblings, 0 replies; 8+ messages in thread
From: Jan Kiszka @ 2021-08-04  6:58 UTC (permalink / raw)
  To: Michael Marley, Jean Delvare, linux-watchdog, LKML, stable
  Cc: Guenter Roeck, Wim Van Sebroeck

On 04.08.21 02:04, Michael Marley wrote:
> On 8/3/21 10:59 AM, Jan Kiszka wrote:
> 
>> https://lkml.org/lkml/2021/7/26/349
>>
> 
> It fixes the problem for me (the person who opened the Bugzilla report)
> too, thanks!
> 
> Tested-by: Michael Marley <michael@michaelmarley.com>
> 

Thanks for the confirmation!

Yeah, sorry, that original mistake is truly mine. In fact, I wrote this
code twice [1] but only messed it up here. Unfortunate that it spread so
quickly. I'll try discuss this with stable people eventually, if it was
a one off or if there are more such cases.

Jan

[1]
https://github.com/siemens/efibootguard/commit/aa89fe3cbd883198c23eaec43c4448fe9e8ae148

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout"
  2021-08-03 14:51 Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout" Jean Delvare
  2021-08-03 14:59 ` Jan Kiszka
@ 2021-08-06  6:28 ` Greg KH
  1 sibling, 0 replies; 8+ messages in thread
From: Greg KH @ 2021-08-06  6:28 UTC (permalink / raw)
  To: Jean Delvare
  Cc: linux-watchdog, LKML, stable, Jan Kiszka, Guenter Roeck,
	Wim Van Sebroeck, Michael Marley

On Tue, Aug 03, 2021 at 04:51:08PM +0200, Jean Delvare wrote:
> Hi all,
> 
> Commit cb011044e34c ("watchdog: iTCO_wdt: Account for rebooting on
> second timeout") causes a regression on several systems. Symptoms are:
> system reboots automatically after a short period of time if watchdog
> is enabled (by systemd for example). This has been reported in bugzilla:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=213809
> 
> Unfortunately this commit was backported to all stable kernel branches
> (4.14, 4.19, 5.4, 5.10, 5.12 and 5.13). I'm not sure why that is the
> case, BTW, as there is no Fixes tag and no Cc to stable@vger either.
> And the fix is not trivial, has apparently not seen enough testing,
> and addresses a problem that has a known and simple workaround. IMHO it
> should never have been accepted as a stable patch in the first place.
> Especially when the previous attempt to fix this issue already ended
> with a regression and a revert.
> 
> Anyway... After a glance at the patch, I see what looks like a nice
> thinko:
> 
> +	if (p->smi_res &&
> +	    (SMI_EN(p) & (TCO_EN | GBL_SMI_EN)) != (TCO_EN | GBL_SMI_EN))
> 
> The author most certainly meant inl(SMI_EN(p)) (the register's value)
> and not SMI_EN(p) (the register's address).

Let me go revert this from the stable trees now, thanks for the report.

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-08-06  6:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-03 14:51 Faulty commit "watchdog: iTCO_wdt: Account for rebooting on second timeout" Jean Delvare
2021-08-03 14:59 ` Jan Kiszka
2021-08-03 15:01   ` Jan Kiszka
2021-08-03 15:27     ` Guenter Roeck
2021-08-03 15:32   ` Jean Delvare
2021-08-04  0:04   ` Michael Marley
2021-08-04  6:58     ` Jan Kiszka
2021-08-06  6:28 ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).