linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Missing watchdog after ACPI watchdog creation failure
       [not found] <s5hd12h21cg.wl-tiwai@suse.de>
@ 2018-01-10 15:23 ` Mika Westerberg
  2018-01-10 15:43   ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: Mika Westerberg @ 2018-01-10 15:23 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Rafael J. Wysocki, linux-kernel

On Wed, Jan 10, 2018 at 04:13:51PM +0100, Takashi Iwai wrote:
> Hi,
> 
> on the recent kernels, i2c-i801 skips the creation of iTCO wdt when
> ACPI WDAT is present.  It's fine when ACPI really creates the watchdog
> device.  But, we've got a report showing that the watchdog is missing
> on some machines because ACPI failed to create, and yet i2c-i801 still
> skips because acpi_has_watchdog() returns true.
> 
> More specifically, the machine gets an error from acpi_watchdog.c
> like:
>   platform wdat_wdt: failed to claim resource 3: [io 0x040a-0x040c]
>   ACPI: watchdog: Device creation failed: -16
> 
> where the region was registered by pnp,
>   % cat /proc/ioports
>   ....
>   0400-047f : pnp 00:01
> 
> 
> One may say that BIOS sucks, but OTOH, the complete lack of watchdog
> thereafter can be seen as a regression, too.  It used to work on the
> older kernel as iTCO wdt was provided by i2c-i801.

Hmm, if the resource is already taken I wonder how iTCO can work? Are
you sure iTCO works on those systems?

> Shouldn't acpi_has_watchdog() rather checks whether the watchdog
> device creation succeeded or not?

Yes, or rather we should first figure out what the actual problem is ;-)

Are you able to get acpidump from that system with full dmesg?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Missing watchdog after ACPI watchdog creation failure
  2018-01-10 15:23 ` Missing watchdog after ACPI watchdog creation failure Mika Westerberg
@ 2018-01-10 15:43   ` Takashi Iwai
  2018-01-17 11:53     ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2018-01-10 15:43 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Rafael J. Wysocki, linux-kernel

On Wed, 10 Jan 2018 16:23:43 +0100,
Mika Westerberg wrote:
> 
> On Wed, Jan 10, 2018 at 04:13:51PM +0100, Takashi Iwai wrote:
> > Hi,
> > 
> > on the recent kernels, i2c-i801 skips the creation of iTCO wdt when
> > ACPI WDAT is present.  It's fine when ACPI really creates the watchdog
> > device.  But, we've got a report showing that the watchdog is missing
> > on some machines because ACPI failed to create, and yet i2c-i801 still
> > skips because acpi_has_watchdog() returns true.
> > 
> > More specifically, the machine gets an error from acpi_watchdog.c
> > like:
> >   platform wdat_wdt: failed to claim resource 3: [io 0x040a-0x040c]
> >   ACPI: watchdog: Device creation failed: -16
> > 
> > where the region was registered by pnp,
> >   % cat /proc/ioports
> >   ....
> >   0400-047f : pnp 00:01
> > 
> > 
> > One may say that BIOS sucks, but OTOH, the complete lack of watchdog
> > thereafter can be seen as a regression, too.  It used to work on the
> > older kernel as iTCO wdt was provided by i2c-i801.
> 
> Hmm, if the resource is already taken I wonder how iTCO can work? Are
> you sure iTCO works on those systems?

Yes, that's the reason we got a bug report :)
4.4 kernel worked, and 4.12 (and later) don't.

On 4.4.x,
% /proc/ioports
0000-0cf7 : PCI Bus 0000:00
...
  0400-047f : pnp 00:01
    0400-041f : iTCO_wdt
      0400-041f : iTCO_wdt
  0500-0503 : ACPI PM1a_EVT_BLK

On 4.12.x,
% /proc/ioports
0000-0cf7 : PCI Bus 0000:00
...
  0400-047f : pnp 00:01
  0500-053f : pnp 00:01

> > Shouldn't acpi_has_watchdog() rather checks whether the watchdog
> > device creation succeeded or not?
> 
> Yes, or rather we should first figure out what the actual problem is ;-)
> 
> Are you able to get acpidump from that system with full dmesg?

I'll ask the reporter.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Missing watchdog after ACPI watchdog creation failure
  2018-01-10 15:43   ` Takashi Iwai
@ 2018-01-17 11:53     ` Takashi Iwai
  2018-01-18 10:20       ` Mika Westerberg
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2018-01-17 11:53 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Rafael J. Wysocki, linux-kernel

On Wed, 10 Jan 2018 16:43:23 +0100,
Takashi Iwai wrote:
> 
> On Wed, 10 Jan 2018 16:23:43 +0100,
> Mika Westerberg wrote:
> > 
> > On Wed, Jan 10, 2018 at 04:13:51PM +0100, Takashi Iwai wrote:
> > > Hi,
> > > 
> > > on the recent kernels, i2c-i801 skips the creation of iTCO wdt when
> > > ACPI WDAT is present.  It's fine when ACPI really creates the watchdog
> > > device.  But, we've got a report showing that the watchdog is missing
> > > on some machines because ACPI failed to create, and yet i2c-i801 still
> > > skips because acpi_has_watchdog() returns true.
> > > 
> > > More specifically, the machine gets an error from acpi_watchdog.c
> > > like:
> > >   platform wdat_wdt: failed to claim resource 3: [io 0x040a-0x040c]
> > >   ACPI: watchdog: Device creation failed: -16
> > > 
> > > where the region was registered by pnp,
> > >   % cat /proc/ioports
> > >   ....
> > >   0400-047f : pnp 00:01
> > > 
> > > 
> > > One may say that BIOS sucks, but OTOH, the complete lack of watchdog
> > > thereafter can be seen as a regression, too.  It used to work on the
> > > older kernel as iTCO wdt was provided by i2c-i801.
> > 
> > Hmm, if the resource is already taken I wonder how iTCO can work? Are
> > you sure iTCO works on those systems?
> 
> Yes, that's the reason we got a bug report :)
> 4.4 kernel worked, and 4.12 (and later) don't.
> 
> On 4.4.x,
> % /proc/ioports
> 0000-0cf7 : PCI Bus 0000:00
> ...
>   0400-047f : pnp 00:01
>     0400-041f : iTCO_wdt
>       0400-041f : iTCO_wdt
>   0500-0503 : ACPI PM1a_EVT_BLK
> 
> On 4.12.x,
> % /proc/ioports
> 0000-0cf7 : PCI Bus 0000:00
> ...
>   0400-047f : pnp 00:01
>   0500-053f : pnp 00:01
> 
> > > Shouldn't acpi_has_watchdog() rather checks whether the watchdog
> > > device creation succeeded or not?
> > 
> > Yes, or rather we should first figure out what the actual problem is ;-)
> > 
> > Are you able to get acpidump from that system with full dmesg?
> 
> I'll ask the reporter.

Unfortunately we couldn't get approval yet, since it's a prototype
machine.

Meanwhile, the reporter tested the patch below and confirmed to work.
(It might be racy for acpi_has_watchdog() call during the probe, but
 you see the idea.)


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] ACPI / watchdog: Return probe fail state from acpi_has_watchdog()

The commit 058dfc767008 ("ACPI / watchdog: Add support for WDAT
hardware watchdog") introduced the WDAT watchdog support and prefers
it over iTCO if present.  However, this introduced a regression on a
machine with funky BIOS that doesn't set up WDAT resources properly.
Since acpi_has_watchdog() checks only the presence of WDAT entry and
it doesn't care whether the acpi watchdog device creation succeeded or
not, it ends up with no watchdog.

This patch fixes acpi_has_wathdog() to return false if the acpi
watchdog probe failed, so that iTCO watchdog can be created as a
fallback.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 drivers/acpi/acpi_watchdog.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_watchdog.c b/drivers/acpi/acpi_watchdog.c
index 11b113f8e367..b421081f0948 100644
--- a/drivers/acpi/acpi_watchdog.c
+++ b/drivers/acpi/acpi_watchdog.c
@@ -17,6 +17,8 @@
 
 #include "internal.h"
 
+static bool watchdog_failed;
+
 /**
  * Returns true if this system should prefer ACPI based watchdog instead of
  * the native one (which are typically the same hardware).
@@ -25,7 +27,7 @@ bool acpi_has_watchdog(void)
 {
 	struct acpi_table_header hdr;
 
-	if (acpi_disabled)
+	if (acpi_disabled || watchdog_failed)
 		return false;
 
 	return ACPI_SUCCESS(acpi_get_table_header(ACPI_SIG_WDAT, 0, &hdr));
@@ -44,6 +46,8 @@ void __init acpi_watchdog_init(void)
 	acpi_status status;
 	int i;
 
+	watchdog_failed = true;
+
 	status = acpi_get_table(ACPI_SIG_WDAT, 0,
 				(struct acpi_table_header **)&wdat);
 	if (ACPI_FAILURE(status)) {
@@ -120,6 +124,8 @@ void __init acpi_watchdog_init(void)
 					       resources, nresources);
 	if (IS_ERR(pdev))
 		pr_err("Device creation failed: %ld\n", PTR_ERR(pdev));
+	else
+		watchdog_failed = false; /* probe success */
 
 	kfree(resources);
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Missing watchdog after ACPI watchdog creation failure
  2018-01-17 11:53     ` Takashi Iwai
@ 2018-01-18 10:20       ` Mika Westerberg
  2018-01-18 11:26         ` Mika Westerberg
  0 siblings, 1 reply; 8+ messages in thread
From: Mika Westerberg @ 2018-01-18 10:20 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Rafael J. Wysocki, linux-kernel

On Wed, Jan 17, 2018 at 12:53:41PM +0100, Takashi Iwai wrote:
> Unfortunately we couldn't get approval yet, since it's a prototype
> machine.

In that case, I think the system itself and its ACPI tables should be
fixed if possible. Windows relies on that table as well so unless there
is something terribly wrong in how we allocate resources in Linux,
Windows should fail the same way. There is good reason why the WDAT
table is there in the first place so using iTCO to poke the hardware
directly might cause some other problems. Windows does not have iTCO
driver at all.

> Meanwhile, the reporter tested the patch below and confirmed to work.
> (It might be racy for acpi_has_watchdog() call during the probe, but
>  you see the idea.)

I would rather not to add any kinds of quirks for systems that are still
in development phase and the BIOS can be fixed. Basic idea is that if
the WDAT table is there we expect it to be correct and at least the
systems I'm aware of that's the case.

Of course if it turns out to be a problem in a real production system we
need to find out what the actual problem is (i.e why the resource
allocation fails in the first place) and fix it there.

That said, if Rafael says we should still add the check, I'll make a
patch that does it (based on yours) and send it upstream :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Missing watchdog after ACPI watchdog creation failure
  2018-01-18 10:20       ` Mika Westerberg
@ 2018-01-18 11:26         ` Mika Westerberg
  2018-01-18 11:28           ` Takashi Iwai
  2018-01-18 19:35           ` Rafael J. Wysocki
  0 siblings, 2 replies; 8+ messages in thread
From: Mika Westerberg @ 2018-01-18 11:26 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Rafael J. Wysocki, linux-kernel

On Thu, Jan 18, 2018 at 12:20:32PM +0200, Mika Westerberg wrote:
> On Wed, Jan 17, 2018 at 12:53:41PM +0100, Takashi Iwai wrote:
> > Unfortunately we couldn't get approval yet, since it's a prototype
> > machine.
> 
> In that case, I think the system itself and its ACPI tables should be
> fixed if possible. Windows relies on that table as well so unless there
> is something terribly wrong in how we allocate resources in Linux,
> Windows should fail the same way. There is good reason why the WDAT
> table is there in the first place so using iTCO to poke the hardware
> directly might cause some other problems. Windows does not have iTCO
> driver at all.
> 
> > Meanwhile, the reporter tested the patch below and confirmed to work.
> > (It might be racy for acpi_has_watchdog() call during the probe, but
> >  you see the idea.)
> 
> I would rather not to add any kinds of quirks for systems that are still
> in development phase and the BIOS can be fixed. Basic idea is that if
> the WDAT table is there we expect it to be correct and at least the
> systems I'm aware of that's the case.
> 
> Of course if it turns out to be a problem in a real production system we
> need to find out what the actual problem is (i.e why the resource
> allocation fails in the first place) and fix it there.
> 
> That said, if Rafael says we should still add the check, I'll make a
> patch that does it (based on yours) and send it upstream :)

However, we can still check if the WDAT is actually enabled and prevent
creation of the device in that case. It may be that the BIOS always
exposes the table but the device itself is disabled.

Can you ask the reporter to try the below patch and see if it helps?

diff --git a/drivers/acpi/acpi_watchdog.c b/drivers/acpi/acpi_watchdog.c
index 11b113f8e367..e785a1ae57c8 100644
--- a/drivers/acpi/acpi_watchdog.c
+++ b/drivers/acpi/acpi_watchdog.c
@@ -17,18 +17,34 @@
 
 #include "internal.h"
 
+static const struct acpi_table_wdat * acpi_watchdog_get_wdat(void)
+{
+	const struct acpi_table_wdat *wdat = NULL;
+	acpi_status status;
+
+	status = acpi_get_table(ACPI_SIG_WDAT, 0,
+				(struct acpi_table_header **)&wdat);
+	if (ACPI_FAILURE(status)) {
+		/* It is fine if there is no WDAT */
+		return NULL;
+	}
+
+	return wdat;
+}
+
 /**
  * Returns true if this system should prefer ACPI based watchdog instead of
  * the native one (which are typically the same hardware).
  */
 bool acpi_has_watchdog(void)
 {
-	struct acpi_table_header hdr;
+	const struct acpi_table_wdat *wdat;
 
 	if (acpi_disabled)
 		return false;
 
-	return ACPI_SUCCESS(acpi_get_table_header(ACPI_SIG_WDAT, 0, &hdr));
+	wdat = acpi_watchdog_get_wdat();
+	return wdat && (wdat->flags & ACPI_WDAT_ENABLED);
 }
 EXPORT_SYMBOL_GPL(acpi_has_watchdog);
 
@@ -41,18 +57,11 @@ void __init acpi_watchdog_init(void)
 	struct platform_device *pdev;
 	struct resource *resources;
 	size_t nresources = 0;
-	acpi_status status;
 	int i;
 
-	status = acpi_get_table(ACPI_SIG_WDAT, 0,
-				(struct acpi_table_header **)&wdat);
-	if (ACPI_FAILURE(status)) {
-		/* It is fine if there is no WDAT */
-		return;
-	}
-
+	wdat = acpi_watchdog_get_wdat();
 	/* Watchdog disabled by BIOS */
-	if (!(wdat->flags & ACPI_WDAT_ENABLED))
+	if (!wdat || !(wdat->flags & ACPI_WDAT_ENABLED))
 		return;
 
 	/* Skip legacy PCI WDT devices */

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Missing watchdog after ACPI watchdog creation failure
  2018-01-18 11:26         ` Mika Westerberg
@ 2018-01-18 11:28           ` Takashi Iwai
  2018-02-13  9:38             ` Takashi Iwai
  2018-01-18 19:35           ` Rafael J. Wysocki
  1 sibling, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2018-01-18 11:28 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Rafael J. Wysocki, linux-kernel

On Thu, 18 Jan 2018 12:26:37 +0100,
Mika Westerberg wrote:
> 
> On Thu, Jan 18, 2018 at 12:20:32PM +0200, Mika Westerberg wrote:
> > On Wed, Jan 17, 2018 at 12:53:41PM +0100, Takashi Iwai wrote:
> > > Unfortunately we couldn't get approval yet, since it's a prototype
> > > machine.
> > 
> > In that case, I think the system itself and its ACPI tables should be
> > fixed if possible. Windows relies on that table as well so unless there
> > is something terribly wrong in how we allocate resources in Linux,
> > Windows should fail the same way. There is good reason why the WDAT
> > table is there in the first place so using iTCO to poke the hardware
> > directly might cause some other problems. Windows does not have iTCO
> > driver at all.
> > 
> > > Meanwhile, the reporter tested the patch below and confirmed to work.
> > > (It might be racy for acpi_has_watchdog() call during the probe, but
> > >  you see the idea.)
> > 
> > I would rather not to add any kinds of quirks for systems that are still
> > in development phase and the BIOS can be fixed. Basic idea is that if
> > the WDAT table is there we expect it to be correct and at least the
> > systems I'm aware of that's the case.
> > 
> > Of course if it turns out to be a problem in a real production system we
> > need to find out what the actual problem is (i.e why the resource
> > allocation fails in the first place) and fix it there.
> > 
> > That said, if Rafael says we should still add the check, I'll make a
> > patch that does it (based on yours) and send it upstream :)
> 
> However, we can still check if the WDAT is actually enabled and prevent
> creation of the device in that case. It may be that the BIOS always
> exposes the table but the device itself is disabled.
> 
> Can you ask the reporter to try the below patch and see if it helps?

OK, will provide a test kernel and ask for testing with it.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Missing watchdog after ACPI watchdog creation failure
  2018-01-18 11:26         ` Mika Westerberg
  2018-01-18 11:28           ` Takashi Iwai
@ 2018-01-18 19:35           ` Rafael J. Wysocki
  1 sibling, 0 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2018-01-18 19:35 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Takashi Iwai, Rafael J. Wysocki, linux-kernel

On Thursday, January 18, 2018 12:26:37 PM CET Mika Westerberg wrote:
> On Thu, Jan 18, 2018 at 12:20:32PM +0200, Mika Westerberg wrote:
> > On Wed, Jan 17, 2018 at 12:53:41PM +0100, Takashi Iwai wrote:
> > > Unfortunately we couldn't get approval yet, since it's a prototype
> > > machine.
> > 
> > In that case, I think the system itself and its ACPI tables should be
> > fixed if possible. Windows relies on that table as well so unless there
> > is something terribly wrong in how we allocate resources in Linux,
> > Windows should fail the same way. There is good reason why the WDAT
> > table is there in the first place so using iTCO to poke the hardware
> > directly might cause some other problems. Windows does not have iTCO
> > driver at all.
> > 
> > > Meanwhile, the reporter tested the patch below and confirmed to work.
> > > (It might be racy for acpi_has_watchdog() call during the probe, but
> > >  you see the idea.)
> > 
> > I would rather not to add any kinds of quirks for systems that are still
> > in development phase and the BIOS can be fixed. Basic idea is that if
> > the WDAT table is there we expect it to be correct and at least the
> > systems I'm aware of that's the case.
> > 
> > Of course if it turns out to be a problem in a real production system we
> > need to find out what the actual problem is (i.e why the resource
> > allocation fails in the first place) and fix it there.
> > 
> > That said, if Rafael says we should still add the check, I'll make a
> > patch that does it (based on yours) and send it upstream :)
> 
> However, we can still check if the WDAT is actually enabled and prevent
> creation of the device in that case. It may be that the BIOS always
> exposes the table but the device itself is disabled.

Adding quirks for systems under development that may be subject to changes is
rather pointless IMO, but of course general sanity checks that make sense
should be done (and possibly complain about inconsistencies if any).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Missing watchdog after ACPI watchdog creation failure
  2018-01-18 11:28           ` Takashi Iwai
@ 2018-02-13  9:38             ` Takashi Iwai
  0 siblings, 0 replies; 8+ messages in thread
From: Takashi Iwai @ 2018-02-13  9:38 UTC (permalink / raw)
  To: Mika Westerberg; +Cc: Rafael J. Wysocki, linux-kernel

On Thu, 18 Jan 2018 12:28:26 +0100,
Takashi Iwai wrote:
> 
> On Thu, 18 Jan 2018 12:26:37 +0100,
> Mika Westerberg wrote:
> > 
> > On Thu, Jan 18, 2018 at 12:20:32PM +0200, Mika Westerberg wrote:
> > > On Wed, Jan 17, 2018 at 12:53:41PM +0100, Takashi Iwai wrote:
> > > > Unfortunately we couldn't get approval yet, since it's a prototype
> > > > machine.
> > > 
> > > In that case, I think the system itself and its ACPI tables should be
> > > fixed if possible. Windows relies on that table as well so unless there
> > > is something terribly wrong in how we allocate resources in Linux,
> > > Windows should fail the same way. There is good reason why the WDAT
> > > table is there in the first place so using iTCO to poke the hardware
> > > directly might cause some other problems. Windows does not have iTCO
> > > driver at all.
> > > 
> > > > Meanwhile, the reporter tested the patch below and confirmed to work.
> > > > (It might be racy for acpi_has_watchdog() call during the probe, but
> > > >  you see the idea.)
> > > 
> > > I would rather not to add any kinds of quirks for systems that are still
> > > in development phase and the BIOS can be fixed. Basic idea is that if
> > > the WDAT table is there we expect it to be correct and at least the
> > > systems I'm aware of that's the case.
> > > 
> > > Of course if it turns out to be a problem in a real production system we
> > > need to find out what the actual problem is (i.e why the resource
> > > allocation fails in the first place) and fix it there.
> > > 
> > > That said, if Rafael says we should still add the check, I'll make a
> > > patch that does it (based on yours) and send it upstream :)
> > 
> > However, we can still check if the WDAT is actually enabled and prevent
> > creation of the device in that case. It may be that the BIOS always
> > exposes the table but the device itself is disabled.
> > 
> > Can you ask the reporter to try the below patch and see if it helps?
> 
> OK, will provide a test kernel and ask for testing with it.

Sorry for reviving the old thread: I've been off for the last few
weeks.

Now we got the updated information.  The result was negative,
unfortunately the patch didn't help.  Also, I was told that the
problem is seen on the production system, so this isn't only about the
prototype one.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-02-13  9:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <s5hd12h21cg.wl-tiwai@suse.de>
2018-01-10 15:23 ` Missing watchdog after ACPI watchdog creation failure Mika Westerberg
2018-01-10 15:43   ` Takashi Iwai
2018-01-17 11:53     ` Takashi Iwai
2018-01-18 10:20       ` Mika Westerberg
2018-01-18 11:26         ` Mika Westerberg
2018-01-18 11:28           ` Takashi Iwai
2018-02-13  9:38             ` Takashi Iwai
2018-01-18 19:35           ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).