Linux-Watchdog Archive on lore.kernel.org
 help / color / Atom feed
From: Muni Sekhar <munisekharrms@gmail.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: linux-watchdog@vger.kernel.org, wim@linux-watchdog.org,
	Bjorn Helgaas <helgaas@kernel.org>
Subject: Re: watchdog: how to enable?
Date: Mon, 18 Nov 2019 20:37:26 +0530
Message-ID: <CAHhAz+jA-7518cyYvBySbqxtTCUfsiHN4NbZW3mSaqu8F9Zm=g@mail.gmail.com> (raw)
In-Reply-To: <da120ac6-062a-3dcc-e635-979fdd021592@roeck-us.net>

On Mon, Nov 18, 2019 at 7:40 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 11/18/19 1:52 AM, Muni Sekhar wrote:
> > On Sun, Nov 17, 2019 at 3:12 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >>
> >> On 11/16/19 10:34 AM, Muni Sekhar wrote:
> >>> On Sat, Nov 16, 2019 at 9:31 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>>>
> >>>> On 11/15/19 7:03 PM, Muni Sekhar wrote:
> >>>> [ ... ]
> >>>>>>
> >>>>>> Another possibility, of course, might be to enable a hardware watchdog
> >>>>>> in your system (assuming it supports one). I personally would not trust
> >>>>>> the NMI watchdog because to detect a system hang, after all, there are
> >>>>>> situations where even NMIs no longer work.
> >>>>>
> >>>>> >From dmesg , Is it possible to know whether my system supports
> >>>>> hardware watchdog or not?
> >>>>> I assume that my system supports the hardware watchdog , then how to
> >>>>> enable the hardware watchdog to debug the system freeze issues?
> >>>>>
> >>>>
> >>>> Hardware watchdog support really depends on the board type. Most PC
> >>>> mainboards support a watchdog in the Super-IO chip, but on some it is
> >>>> not wired correctly. On embedded boards it is often built into the SoC.
> >>>> The easiest way to see if you have a watchdog would be to check for the
> >>>> existence of /dev/watchdog. However, on a PC that would most likely
> >>>> not be there because the necessary module is not auto-loaded.
> >>>> If you tell us your board type, or better the Super-IO chip on the board,
> >>>> we might be able to help.
> >>>
> >>> I’m having two same configuration systems, in one system I installed
> >>> the Vanilla kernel and I see the /dev/watchdog and /dev/watchdog0
> >>> nodes. In other system I’m running with ubuntu distribution kernel,
> >>> but I don’t see any watchdog device node. So it looks like I need to
> >>> manually load the kernel module in distro kernel. Is there a way to
> >>> know what is the corresponding kernel module for  /dev/watchdog node?
> >>>
> >>> # ls -l /dev/watchdog*
> >>> crw------- 1 root root  10, 130 Nov 15 17:15 /dev/watchdog
> >>> crw------- 1 root root 248,   0 Nov 15 17:15 /dev/watchdog0
> >>>
> >>> # ps -ax | grep watchdog
> >>>     678 ?        S      0:00 [watchdogd]
> >>>
> >>> Regarding Super-IO chip, how to find out the Super-IO chip model?
> >>>
> >> You could try to run sensors-detect (from the "sensors" package).
> >>
> >> If you can boot a system with /dev/watchdog0, you should see the type
> >> in /sys/class/watchdog/watchdog0/identity.
> > I could not find the /sys/class/watchdog/watchdog0/identity and
> > /sys/class/watchdog/watchdog0/timeout files.
> > $ ls -l /sys/class/watchdog/watchdog0/
> > total 0
> > -r--r--r-- 1 root root 4096 Nov 18 15:12 dev
> > lrwxrwxrwx 1 root root    0 Nov 18 15:12 device -> ../../../iTCO_wdt.0.auto
> > drwxr-xr-x 2 root root    0 Nov 18 15:12 power
> > lrwxrwxrwx 1 root root    0 Nov 18 14:53 subsystem ->
> > ../../../../../../class/watchdog
> > -rw-r--r-- 1 root root 4096 Nov 18 14:53 uevent
> >
>
> Presumably CONFIG_WATCHDOG_SYSFS is not enabled in your configuration.
>
> >>
> >> Also, you can test if the watchdog works with "sudo cat /dev/watchdog",
> >> assuming the watchdog daemon is not running. The watchdog works if the
> >> system reboots after the watchdog times out (/sys/class/watchdog/watchdog0/timeout
> >> is the timeout in seconds).
> > sudo cat /dev/watchdog perfectly rebooted my system. I don't see
> > timeout node, how do I configure the timeout value?
>
> sudo apt-get install watchdog
> man watchdog
>
> should tell you. Alternatively, enable CONFIG_WATCHDOG_SYSFS.
>
> >>
> >>>>
> >>>> Note though that this won't help to debug the problem. A hardware
> >>>> watchdog resets the system. It helps to recover, but it is not intended
> >>>> to help with debugging.
> >>> How do I use the hardware watchdog to reset my system when system is
> >>> frozen? It helps me to collect the crashdump and finally helps me to
> >>> find the root cause for the system frozen issue.
> >>>
> >> There won't be a crashdump. It just hard-resets the system.
> > So is there any other solution to capture the crashdump or trigger
> > soft reboot once kernel is lockedup?
>
> Not that I know of. I suspect, though, that you either have a hard lockup
> where even NMI is non-operational, or NMI doesn't work in your system
> to start with.
>
> If you have nmi_watchdog=1 in your kernel command line, /proc/interrupts
> should show a non-zero number of NMI interrupts. Do you see that in your system ?

Yes, I see non-zero number. When it(NMI interrupt count) supposed to change?

$ cat /proc/interrupts | grep NMI
 NMI:       4129       4153       4192        183   Non-maskable interrupts

$ dmesg | grep NMI
[    0.402175] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[    0.402199] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[    0.402220] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
[    0.402242] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
[    4.636467] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[    4.658289] | NMI testsuite:
[   13.863284] INFO: NMI handler (kgdb_nmi_handler) took too long to
run: 9.744 msecs

Also I enabled pstore\ramoops. While testing the hardware watchdog by
running 'sudo cat /dev/watchdog', I see that console dump updates
between next boot. I see the same behavior consistently.

$ cat /sys/fs/pstore/console-ramoops-0
[  293.462623] printk: console [pstore-1] enabled
[  293.471026] pstore: Registered ramoops as persistent store backend
[  293.477800] ramoops: using 0x100000@0x3ff00000, ecc: 16
[  315.461263] systemd-journald[1665]: Sent WATCHDOG=1 notification.
[  317.447791] watchdog: watchdog0: nowayout prevents watchdog being stopped!
[  317.456616] watchdog: watchdog0: watchdog did not stop!
No errors detected

Now I installed the watchdog daemon and started that service before
the kernel locks up. On triggering few tests kernel locked up and
hardware watchdog triggered the reset, but in this case I don't see
console-ramoops-0 file. Only difference is , this time 'watchdog'
daemon triggered the hardware watchdog. Not sure why console dump not
updated in this scenario?


>
> Guenter



-- 
Thanks,
Sekhar

  reply index

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-16  0:35 Muni Sekhar
2019-11-16  1:04 ` Guenter Roeck
2019-11-16  3:03   ` Muni Sekhar
2019-11-16 16:01     ` Guenter Roeck
2019-11-16 18:34       ` Muni Sekhar
2019-11-16 21:42         ` Guenter Roeck
2019-11-18  9:52           ` Muni Sekhar
2019-11-18 14:10             ` Guenter Roeck
2019-11-18 15:07               ` Muni Sekhar [this message]
2019-11-18 14:38 ` Bjorn Helgaas
2019-11-18 14:41   ` Bjorn Helgaas
2019-11-18 15:09   ` Muni Sekhar
2019-11-22 10:59     ` Guenter Roeck
2019-11-22 12:54       ` Muni Sekhar

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHhAz+jA-7518cyYvBySbqxtTCUfsiHN4NbZW3mSaqu8F9Zm=g@mail.gmail.com' \
    --to=munisekharrms@gmail.com \
    --cc=helgaas@kernel.org \
    --cc=linux-watchdog@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=wim@linux-watchdog.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Watchdog Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-watchdog/0 linux-watchdog/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-watchdog linux-watchdog/ https://lore.kernel.org/linux-watchdog \
		linux-watchdog@vger.kernel.org
	public-inbox-index linux-watchdog

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-watchdog


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git