linux-watchdog.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* watchdog: how to enable?
@ 2019-11-16  0:35 Muni Sekhar
  2019-11-16  1:04 ` Guenter Roeck
  2019-11-18 14:38 ` Bjorn Helgaas
  0 siblings, 2 replies; 14+ messages in thread
From: Muni Sekhar @ 2019-11-16  0:35 UTC (permalink / raw)
  To: linux-watchdog, linux-pci, wim, linux

[ Please keep me in CC as I'm not subscribed to the list]

Hi All,

My kernel is built with the following options:

$ cat /boot/config-5.0.1 | grep NO_HZ
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_RCU_FAST_NO_HZ=y

I booted with watchdog enabled(nmi_watchdog=1) as given below:

BOOT_IMAGE=/boot/vmlinuz-5.0.1
root=UUID=f65454ae-3f1d-4b9e-b4be-74a29becbe1e ro debug
ignore_loglevel console=ttyUSB0,115200 console=tty0 console=tty1
console=ttyS2,115200 memmap=1M!1023M nmi_watchdog=1
crashkernel=384M-:128M

When the system is frozen or the kernel is locked up(I noticed that in
this state kernel is not responding for ALT-SysRq-<command key>) but
watchdog is not triggered. So I want to understand how to enable the
watchdog timer and how to verify the basic watchdog functionality
behavior?

Any pointers on this will be greatly appreciated.

--
Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-16  0:35 watchdog: how to enable? Muni Sekhar
@ 2019-11-16  1:04 ` Guenter Roeck
  2019-11-16  3:03   ` Muni Sekhar
  2019-11-18 14:38 ` Bjorn Helgaas
  1 sibling, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2019-11-16  1:04 UTC (permalink / raw)
  To: Muni Sekhar, linux-watchdog, linux-pci, wim

On 11/15/19 4:35 PM, Muni Sekhar wrote:
> [ Please keep me in CC as I'm not subscribed to the list]
> 
> Hi All,
> 
> My kernel is built with the following options:
> 
> $ cat /boot/config-5.0.1 | grep NO_HZ
> CONFIG_NO_HZ_COMMON=y
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_NO_HZ=y
> CONFIG_RCU_FAST_NO_HZ=y
> 
> I booted with watchdog enabled(nmi_watchdog=1) as given below:
> 
> BOOT_IMAGE=/boot/vmlinuz-5.0.1
> root=UUID=f65454ae-3f1d-4b9e-b4be-74a29becbe1e ro debug
> ignore_loglevel console=ttyUSB0,115200 console=tty0 console=tty1
> console=ttyS2,115200 memmap=1M!1023M nmi_watchdog=1
> crashkernel=384M-:128M
> 
> When the system is frozen or the kernel is locked up(I noticed that in
> this state kernel is not responding for ALT-SysRq-<command key>) but
> watchdog is not triggered. So I want to understand how to enable the
> watchdog timer and how to verify the basic watchdog functionality
> behavior?
>  > Any pointers on this will be greatly appreciated.
> 
Sorry, I do not have an answer. Please note that you are talking about
the NMI watchdog, which is completely unrelated to hardware watchdogs
and not handled by the watchdog subsystem. I would suggest to send
your question to the Linux kernel mailing list and clearly state
that you are talking about the NMI watchdog.

Please note that, for the NMI watchdog to do anything, you must have
CONFIG_HARDLOCKUP_DETECTOR enabled in your kernel configuration. I don't
know what if anything the configuration options you listed above have
to do with the NMI watchdog.

Another possibility, of course, might be to enable a hardware watchdog
in your system (assuming it supports one). I personally would not trust
the NMI watchdog because to detect a system hang, after all, there are
situations where even NMIs no longer work.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-16  1:04 ` Guenter Roeck
@ 2019-11-16  3:03   ` Muni Sekhar
  2019-11-16 16:01     ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Muni Sekhar @ 2019-11-16  3:03 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-watchdog, linux-pci, wim

On Sat, Nov 16, 2019 at 6:34 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 11/15/19 4:35 PM, Muni Sekhar wrote:
> > [ Please keep me in CC as I'm not subscribed to the list]
> >
> > Hi All,
> >
> > My kernel is built with the following options:
> >
> > $ cat /boot/config-5.0.1 | grep NO_HZ
> > CONFIG_NO_HZ_COMMON=y
> > CONFIG_NO_HZ_IDLE=y
> > # CONFIG_NO_HZ_FULL is not set
> > CONFIG_NO_HZ=y
> > CONFIG_RCU_FAST_NO_HZ=y
> >
> > I booted with watchdog enabled(nmi_watchdog=1) as given below:
> >
> > BOOT_IMAGE=/boot/vmlinuz-5.0.1
> > root=UUID=f65454ae-3f1d-4b9e-b4be-74a29becbe1e ro debug
> > ignore_loglevel console=ttyUSB0,115200 console=tty0 console=tty1
> > console=ttyS2,115200 memmap=1M!1023M nmi_watchdog=1
> > crashkernel=384M-:128M
> >
> > When the system is frozen or the kernel is locked up(I noticed that in
> > this state kernel is not responding for ALT-SysRq-<command key>) but
> > watchdog is not triggered. So I want to understand how to enable the
> > watchdog timer and how to verify the basic watchdog functionality
> > behavior?
> >  > Any pointers on this will be greatly appreciated.
> >
> Sorry, I do not have an answer. Please note that you are talking about
> the NMI watchdog, which is completely unrelated to hardware watchdogs
> and not handled by the watchdog subsystem. I would suggest to send
> your question to the Linux kernel mailing list and clearly state
> that you are talking about the NMI watchdog.
>
> Please note that, for the NMI watchdog to do anything, you must have
> CONFIG_HARDLOCKUP_DETECTOR enabled in your kernel configuration. I don't
> know what if anything the configuration options you listed above have
> to do with the NMI watchdog.

Thank you for your response. I enabled hard\soft\lockup detector
config options. My kernel is built with the following .config options:

CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HARDLOCKUP_CHECK_TIMESTAMP=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=1
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=1

Also I enabled the following stuff in /proc/sys/ directory.

kernel.softlockup_panic = 1
kernel.hardlockup_panic = 1
kernel.unknown_nmi_panic = 1
kernel.softlockup_all_cpu_backtrace = 1
kernel.hardlockup_all_cpu_backtrace = 1
kernel.panic = 3
kernel.panic_on_io_nmi = 1
kernel.panic_on_oops = 1
kernel.panic_on_stackoverflow = 1
kernel.panic_on_unrecovered_nmi = 1
kernel.panic_on_rcu_stall = 1
kernel.panic_print = 31
kernel.sysrq=0x1FF


The https://www.kernel.org/doc/Documentation/lockup-watchdogs.txt
Says “By default, the watchdog runs on all online cores.  However, on a
kernel configured with NO_HZ_FULL, by default the watchdog runs only
on the housekeeping cores, not the cores specified in the "nohz_full"
boot argument.”, so I just mentioned my kernel CONFIG_NO_HZ* options.

>
> Another possibility, of course, might be to enable a hardware watchdog
> in your system (assuming it supports one). I personally would not trust
> the NMI watchdog because to detect a system hang, after all, there are
> situations where even NMIs no longer work.

From dmesg , Is it possible to know whether my system supports
hardware watchdog or not?
I assume that my system supports the hardware watchdog , then how to
enable the hardware watchdog to debug the system freeze issues?


>
> Guenter



-- 
Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-16  3:03   ` Muni Sekhar
@ 2019-11-16 16:01     ` Guenter Roeck
  2019-11-16 18:34       ` Muni Sekhar
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2019-11-16 16:01 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: linux-watchdog, linux-pci, wim

On 11/15/19 7:03 PM, Muni Sekhar wrote:
[ ... ]
>>
>> Another possibility, of course, might be to enable a hardware watchdog
>> in your system (assuming it supports one). I personally would not trust
>> the NMI watchdog because to detect a system hang, after all, there are
>> situations where even NMIs no longer work.
> 
>>From dmesg , Is it possible to know whether my system supports
> hardware watchdog or not?
> I assume that my system supports the hardware watchdog , then how to
> enable the hardware watchdog to debug the system freeze issues?
> 

Hardware watchdog support really depends on the board type. Most PC
mainboards support a watchdog in the Super-IO chip, but on some it is
not wired correctly. On embedded boards it is often built into the SoC.
The easiest way to see if you have a watchdog would be to check for the
existence of /dev/watchdog. However, on a PC that would most likely
not be there because the necessary module is not auto-loaded.
If you tell us your board type, or better the Super-IO chip on the board,
we might be able to help.

Note though that this won't help to debug the problem. A hardware
watchdog resets the system. It helps to recover, but it is not intended
to help with debugging.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-16 16:01     ` Guenter Roeck
@ 2019-11-16 18:34       ` Muni Sekhar
  2019-11-16 21:42         ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Muni Sekhar @ 2019-11-16 18:34 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-watchdog, linux-pci, wim

On Sat, Nov 16, 2019 at 9:31 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 11/15/19 7:03 PM, Muni Sekhar wrote:
> [ ... ]
> >>
> >> Another possibility, of course, might be to enable a hardware watchdog
> >> in your system (assuming it supports one). I personally would not trust
> >> the NMI watchdog because to detect a system hang, after all, there are
> >> situations where even NMIs no longer work.
> >
> >>From dmesg , Is it possible to know whether my system supports
> > hardware watchdog or not?
> > I assume that my system supports the hardware watchdog , then how to
> > enable the hardware watchdog to debug the system freeze issues?
> >
>
> Hardware watchdog support really depends on the board type. Most PC
> mainboards support a watchdog in the Super-IO chip, but on some it is
> not wired correctly. On embedded boards it is often built into the SoC.
> The easiest way to see if you have a watchdog would be to check for the
> existence of /dev/watchdog. However, on a PC that would most likely
> not be there because the necessary module is not auto-loaded.
> If you tell us your board type, or better the Super-IO chip on the board,
> we might be able to help.

I’m having two same configuration systems, in one system I installed
the Vanilla kernel and I see the /dev/watchdog and /dev/watchdog0
nodes. In other system I’m running with ubuntu distribution kernel,
but I don’t see any watchdog device node. So it looks like I need to
manually load the kernel module in distro kernel. Is there a way to
know what is the corresponding kernel module for  /dev/watchdog node?

# ls -l /dev/watchdog*
crw------- 1 root root  10, 130 Nov 15 17:15 /dev/watchdog
crw------- 1 root root 248,   0 Nov 15 17:15 /dev/watchdog0

# ps -ax | grep watchdog
  678 ?        S      0:00 [watchdogd]

Regarding Super-IO chip, how to find out the Super-IO chip model?

>
> Note though that this won't help to debug the problem. A hardware
> watchdog resets the system. It helps to recover, but it is not intended
> to help with debugging.
How do I use the hardware watchdog to reset my system when system is
frozen? It helps me to collect the crashdump and finally helps me to
find the root cause for the system frozen issue.

>
> Guenter



-- 
Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-16 18:34       ` Muni Sekhar
@ 2019-11-16 21:42         ` Guenter Roeck
  2019-11-18  9:52           ` Muni Sekhar
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2019-11-16 21:42 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: linux-watchdog, linux-pci, wim

On 11/16/19 10:34 AM, Muni Sekhar wrote:
> On Sat, Nov 16, 2019 at 9:31 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On 11/15/19 7:03 PM, Muni Sekhar wrote:
>> [ ... ]
>>>>
>>>> Another possibility, of course, might be to enable a hardware watchdog
>>>> in your system (assuming it supports one). I personally would not trust
>>>> the NMI watchdog because to detect a system hang, after all, there are
>>>> situations where even NMIs no longer work.
>>>
>>> >From dmesg , Is it possible to know whether my system supports
>>> hardware watchdog or not?
>>> I assume that my system supports the hardware watchdog , then how to
>>> enable the hardware watchdog to debug the system freeze issues?
>>>
>>
>> Hardware watchdog support really depends on the board type. Most PC
>> mainboards support a watchdog in the Super-IO chip, but on some it is
>> not wired correctly. On embedded boards it is often built into the SoC.
>> The easiest way to see if you have a watchdog would be to check for the
>> existence of /dev/watchdog. However, on a PC that would most likely
>> not be there because the necessary module is not auto-loaded.
>> If you tell us your board type, or better the Super-IO chip on the board,
>> we might be able to help.
> 
> I’m having two same configuration systems, in one system I installed
> the Vanilla kernel and I see the /dev/watchdog and /dev/watchdog0
> nodes. In other system I’m running with ubuntu distribution kernel,
> but I don’t see any watchdog device node. So it looks like I need to
> manually load the kernel module in distro kernel. Is there a way to
> know what is the corresponding kernel module for  /dev/watchdog node?
> 
> # ls -l /dev/watchdog*
> crw------- 1 root root  10, 130 Nov 15 17:15 /dev/watchdog
> crw------- 1 root root 248,   0 Nov 15 17:15 /dev/watchdog0
> 
> # ps -ax | grep watchdog
>    678 ?        S      0:00 [watchdogd]
> 
> Regarding Super-IO chip, how to find out the Super-IO chip model?
> 
You could try to run sensors-detect (from the "sensors" package).

If you can boot a system with /dev/watchdog0, you should see the type
in /sys/class/watchdog/watchdog0/identity.

Also, you can test if the watchdog works with "sudo cat /dev/watchdog",
assuming the watchdog daemon is not running. The watchdog works if the
system reboots after the watchdog times out (/sys/class/watchdog/watchdog0/timeout
is the timeout in seconds).

>>
>> Note though that this won't help to debug the problem. A hardware
>> watchdog resets the system. It helps to recover, but it is not intended
>> to help with debugging.
> How do I use the hardware watchdog to reset my system when system is
> frozen? It helps me to collect the crashdump and finally helps me to
> find the root cause for the system frozen issue.
> 
There won't be a crashdump. It just hard-resets the system.

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-16 21:42         ` Guenter Roeck
@ 2019-11-18  9:52           ` Muni Sekhar
  2019-11-18 14:10             ` Guenter Roeck
  0 siblings, 1 reply; 14+ messages in thread
From: Muni Sekhar @ 2019-11-18  9:52 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-watchdog, linux-pci, wim

On Sun, Nov 17, 2019 at 3:12 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 11/16/19 10:34 AM, Muni Sekhar wrote:
> > On Sat, Nov 16, 2019 at 9:31 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>
> >> On 11/15/19 7:03 PM, Muni Sekhar wrote:
> >> [ ... ]
> >>>>
> >>>> Another possibility, of course, might be to enable a hardware watchdog
> >>>> in your system (assuming it supports one). I personally would not trust
> >>>> the NMI watchdog because to detect a system hang, after all, there are
> >>>> situations where even NMIs no longer work.
> >>>
> >>> >From dmesg , Is it possible to know whether my system supports
> >>> hardware watchdog or not?
> >>> I assume that my system supports the hardware watchdog , then how to
> >>> enable the hardware watchdog to debug the system freeze issues?
> >>>
> >>
> >> Hardware watchdog support really depends on the board type. Most PC
> >> mainboards support a watchdog in the Super-IO chip, but on some it is
> >> not wired correctly. On embedded boards it is often built into the SoC.
> >> The easiest way to see if you have a watchdog would be to check for the
> >> existence of /dev/watchdog. However, on a PC that would most likely
> >> not be there because the necessary module is not auto-loaded.
> >> If you tell us your board type, or better the Super-IO chip on the board,
> >> we might be able to help.
> >
> > I’m having two same configuration systems, in one system I installed
> > the Vanilla kernel and I see the /dev/watchdog and /dev/watchdog0
> > nodes. In other system I’m running with ubuntu distribution kernel,
> > but I don’t see any watchdog device node. So it looks like I need to
> > manually load the kernel module in distro kernel. Is there a way to
> > know what is the corresponding kernel module for  /dev/watchdog node?
> >
> > # ls -l /dev/watchdog*
> > crw------- 1 root root  10, 130 Nov 15 17:15 /dev/watchdog
> > crw------- 1 root root 248,   0 Nov 15 17:15 /dev/watchdog0
> >
> > # ps -ax | grep watchdog
> >    678 ?        S      0:00 [watchdogd]
> >
> > Regarding Super-IO chip, how to find out the Super-IO chip model?
> >
> You could try to run sensors-detect (from the "sensors" package).
>
> If you can boot a system with /dev/watchdog0, you should see the type
> in /sys/class/watchdog/watchdog0/identity.
I could not find the /sys/class/watchdog/watchdog0/identity and
/sys/class/watchdog/watchdog0/timeout files.
$ ls -l /sys/class/watchdog/watchdog0/
total 0
-r--r--r-- 1 root root 4096 Nov 18 15:12 dev
lrwxrwxrwx 1 root root    0 Nov 18 15:12 device -> ../../../iTCO_wdt.0.auto
drwxr-xr-x 2 root root    0 Nov 18 15:12 power
lrwxrwxrwx 1 root root    0 Nov 18 14:53 subsystem ->
../../../../../../class/watchdog
-rw-r--r-- 1 root root 4096 Nov 18 14:53 uevent

>
> Also, you can test if the watchdog works with "sudo cat /dev/watchdog",
> assuming the watchdog daemon is not running. The watchdog works if the
> system reboots after the watchdog times out (/sys/class/watchdog/watchdog0/timeout
> is the timeout in seconds).
sudo cat /dev/watchdog perfectly rebooted my system. I don't see
timeout node, how do I configure the timeout value?
>
> >>
> >> Note though that this won't help to debug the problem. A hardware
> >> watchdog resets the system. It helps to recover, but it is not intended
> >> to help with debugging.
> > How do I use the hardware watchdog to reset my system when system is
> > frozen? It helps me to collect the crashdump and finally helps me to
> > find the root cause for the system frozen issue.
> >
> There won't be a crashdump. It just hard-resets the system.
So is there any other solution to capture the crashdump or trigger
soft reboot once kernel is lockedup?
>
> Guenter



-- 
Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-18  9:52           ` Muni Sekhar
@ 2019-11-18 14:10             ` Guenter Roeck
  2019-11-18 15:07               ` Muni Sekhar
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2019-11-18 14:10 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: linux-watchdog, linux-pci, wim

On 11/18/19 1:52 AM, Muni Sekhar wrote:
> On Sun, Nov 17, 2019 at 3:12 AM Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On 11/16/19 10:34 AM, Muni Sekhar wrote:
>>> On Sat, Nov 16, 2019 at 9:31 PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>
>>>> On 11/15/19 7:03 PM, Muni Sekhar wrote:
>>>> [ ... ]
>>>>>>
>>>>>> Another possibility, of course, might be to enable a hardware watchdog
>>>>>> in your system (assuming it supports one). I personally would not trust
>>>>>> the NMI watchdog because to detect a system hang, after all, there are
>>>>>> situations where even NMIs no longer work.
>>>>>
>>>>> >From dmesg , Is it possible to know whether my system supports
>>>>> hardware watchdog or not?
>>>>> I assume that my system supports the hardware watchdog , then how to
>>>>> enable the hardware watchdog to debug the system freeze issues?
>>>>>
>>>>
>>>> Hardware watchdog support really depends on the board type. Most PC
>>>> mainboards support a watchdog in the Super-IO chip, but on some it is
>>>> not wired correctly. On embedded boards it is often built into the SoC.
>>>> The easiest way to see if you have a watchdog would be to check for the
>>>> existence of /dev/watchdog. However, on a PC that would most likely
>>>> not be there because the necessary module is not auto-loaded.
>>>> If you tell us your board type, or better the Super-IO chip on the board,
>>>> we might be able to help.
>>>
>>> I’m having two same configuration systems, in one system I installed
>>> the Vanilla kernel and I see the /dev/watchdog and /dev/watchdog0
>>> nodes. In other system I’m running with ubuntu distribution kernel,
>>> but I don’t see any watchdog device node. So it looks like I need to
>>> manually load the kernel module in distro kernel. Is there a way to
>>> know what is the corresponding kernel module for  /dev/watchdog node?
>>>
>>> # ls -l /dev/watchdog*
>>> crw------- 1 root root  10, 130 Nov 15 17:15 /dev/watchdog
>>> crw------- 1 root root 248,   0 Nov 15 17:15 /dev/watchdog0
>>>
>>> # ps -ax | grep watchdog
>>>     678 ?        S      0:00 [watchdogd]
>>>
>>> Regarding Super-IO chip, how to find out the Super-IO chip model?
>>>
>> You could try to run sensors-detect (from the "sensors" package).
>>
>> If you can boot a system with /dev/watchdog0, you should see the type
>> in /sys/class/watchdog/watchdog0/identity.
> I could not find the /sys/class/watchdog/watchdog0/identity and
> /sys/class/watchdog/watchdog0/timeout files.
> $ ls -l /sys/class/watchdog/watchdog0/
> total 0
> -r--r--r-- 1 root root 4096 Nov 18 15:12 dev
> lrwxrwxrwx 1 root root    0 Nov 18 15:12 device -> ../../../iTCO_wdt.0.auto
> drwxr-xr-x 2 root root    0 Nov 18 15:12 power
> lrwxrwxrwx 1 root root    0 Nov 18 14:53 subsystem ->
> ../../../../../../class/watchdog
> -rw-r--r-- 1 root root 4096 Nov 18 14:53 uevent
> 

Presumably CONFIG_WATCHDOG_SYSFS is not enabled in your configuration.

>>
>> Also, you can test if the watchdog works with "sudo cat /dev/watchdog",
>> assuming the watchdog daemon is not running. The watchdog works if the
>> system reboots after the watchdog times out (/sys/class/watchdog/watchdog0/timeout
>> is the timeout in seconds).
> sudo cat /dev/watchdog perfectly rebooted my system. I don't see
> timeout node, how do I configure the timeout value?

sudo apt-get install watchdog
man watchdog

should tell you. Alternatively, enable CONFIG_WATCHDOG_SYSFS.

>>
>>>>
>>>> Note though that this won't help to debug the problem. A hardware
>>>> watchdog resets the system. It helps to recover, but it is not intended
>>>> to help with debugging.
>>> How do I use the hardware watchdog to reset my system when system is
>>> frozen? It helps me to collect the crashdump and finally helps me to
>>> find the root cause for the system frozen issue.
>>>
>> There won't be a crashdump. It just hard-resets the system.
> So is there any other solution to capture the crashdump or trigger
> soft reboot once kernel is lockedup?

Not that I know of. I suspect, though, that you either have a hard lockup
where even NMI is non-operational, or NMI doesn't work in your system
to start with.

If you have nmi_watchdog=1 in your kernel command line, /proc/interrupts
should show a non-zero number of NMI interrupts. Do you see that in your system ?

Guenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-16  0:35 watchdog: how to enable? Muni Sekhar
  2019-11-16  1:04 ` Guenter Roeck
@ 2019-11-18 14:38 ` Bjorn Helgaas
  2019-11-18 14:41   ` Bjorn Helgaas
  2019-11-18 15:09   ` Muni Sekhar
  1 sibling, 2 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2019-11-18 14:38 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: linux-watchdog, wim, linux

[-cc linux-pci (nothing here is PCI-specific)]

On Sat, Nov 16, 2019 at 06:05:05AM +0530, Muni Sekhar wrote:
> My kernel is built with the following options:
> 
> $ cat /boot/config-5.0.1 | grep NO_HZ
> CONFIG_NO_HZ_COMMON=y
> CONFIG_NO_HZ_IDLE=y
> # CONFIG_NO_HZ_FULL is not set
> CONFIG_NO_HZ=y
> CONFIG_RCU_FAST_NO_HZ=y
> 
> I booted with watchdog enabled(nmi_watchdog=1) as given below:
> 
> BOOT_IMAGE=/boot/vmlinuz-5.0.1
> root=UUID=f65454ae-3f1d-4b9e-b4be-74a29becbe1e ro debug
> ignore_loglevel console=ttyUSB0,115200 console=tty0 console=tty1
> console=ttyS2,115200 memmap=1M!1023M nmi_watchdog=1
> crashkernel=384M-:128M
> 
> When the system is frozen or the kernel is locked up(I noticed that in
> this state kernel is not responding for ALT-SysRq-<command key>) but
> watchdog is not triggered. So I want to understand how to enable the
> watchdog timer and how to verify the basic watchdog functionality
> behavior?

I don't know much about the watchdog, but I assume you've found these
already?

  Documentation/admin-guide/lockup-watchdogs.rst
  Documentation/admin-guide/sysctl/kernel.rst

Do you have CONFIG_HAVE_NMI_WATCHDOG=y?  (See arch/Kconfig)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-18 14:38 ` Bjorn Helgaas
@ 2019-11-18 14:41   ` Bjorn Helgaas
  2019-11-18 15:09   ` Muni Sekhar
  1 sibling, 0 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2019-11-18 14:41 UTC (permalink / raw)
  To: Muni Sekhar; +Cc: linux-watchdog, wim, linux

On Mon, Nov 18, 2019 at 08:38:38AM -0600, Bjorn Helgaas wrote:
> ...
[facepalm, should have read the rest of the thread before cluttering
it, sorry]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-18 14:10             ` Guenter Roeck
@ 2019-11-18 15:07               ` Muni Sekhar
  0 siblings, 0 replies; 14+ messages in thread
From: Muni Sekhar @ 2019-11-18 15:07 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linux-watchdog, wim, Bjorn Helgaas

On Mon, Nov 18, 2019 at 7:40 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 11/18/19 1:52 AM, Muni Sekhar wrote:
> > On Sun, Nov 17, 2019 at 3:12 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >>
> >> On 11/16/19 10:34 AM, Muni Sekhar wrote:
> >>> On Sat, Nov 16, 2019 at 9:31 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>>>
> >>>> On 11/15/19 7:03 PM, Muni Sekhar wrote:
> >>>> [ ... ]
> >>>>>>
> >>>>>> Another possibility, of course, might be to enable a hardware watchdog
> >>>>>> in your system (assuming it supports one). I personally would not trust
> >>>>>> the NMI watchdog because to detect a system hang, after all, there are
> >>>>>> situations where even NMIs no longer work.
> >>>>>
> >>>>> >From dmesg , Is it possible to know whether my system supports
> >>>>> hardware watchdog or not?
> >>>>> I assume that my system supports the hardware watchdog , then how to
> >>>>> enable the hardware watchdog to debug the system freeze issues?
> >>>>>
> >>>>
> >>>> Hardware watchdog support really depends on the board type. Most PC
> >>>> mainboards support a watchdog in the Super-IO chip, but on some it is
> >>>> not wired correctly. On embedded boards it is often built into the SoC.
> >>>> The easiest way to see if you have a watchdog would be to check for the
> >>>> existence of /dev/watchdog. However, on a PC that would most likely
> >>>> not be there because the necessary module is not auto-loaded.
> >>>> If you tell us your board type, or better the Super-IO chip on the board,
> >>>> we might be able to help.
> >>>
> >>> I’m having two same configuration systems, in one system I installed
> >>> the Vanilla kernel and I see the /dev/watchdog and /dev/watchdog0
> >>> nodes. In other system I’m running with ubuntu distribution kernel,
> >>> but I don’t see any watchdog device node. So it looks like I need to
> >>> manually load the kernel module in distro kernel. Is there a way to
> >>> know what is the corresponding kernel module for  /dev/watchdog node?
> >>>
> >>> # ls -l /dev/watchdog*
> >>> crw------- 1 root root  10, 130 Nov 15 17:15 /dev/watchdog
> >>> crw------- 1 root root 248,   0 Nov 15 17:15 /dev/watchdog0
> >>>
> >>> # ps -ax | grep watchdog
> >>>     678 ?        S      0:00 [watchdogd]
> >>>
> >>> Regarding Super-IO chip, how to find out the Super-IO chip model?
> >>>
> >> You could try to run sensors-detect (from the "sensors" package).
> >>
> >> If you can boot a system with /dev/watchdog0, you should see the type
> >> in /sys/class/watchdog/watchdog0/identity.
> > I could not find the /sys/class/watchdog/watchdog0/identity and
> > /sys/class/watchdog/watchdog0/timeout files.
> > $ ls -l /sys/class/watchdog/watchdog0/
> > total 0
> > -r--r--r-- 1 root root 4096 Nov 18 15:12 dev
> > lrwxrwxrwx 1 root root    0 Nov 18 15:12 device -> ../../../iTCO_wdt.0.auto
> > drwxr-xr-x 2 root root    0 Nov 18 15:12 power
> > lrwxrwxrwx 1 root root    0 Nov 18 14:53 subsystem ->
> > ../../../../../../class/watchdog
> > -rw-r--r-- 1 root root 4096 Nov 18 14:53 uevent
> >
>
> Presumably CONFIG_WATCHDOG_SYSFS is not enabled in your configuration.
>
> >>
> >> Also, you can test if the watchdog works with "sudo cat /dev/watchdog",
> >> assuming the watchdog daemon is not running. The watchdog works if the
> >> system reboots after the watchdog times out (/sys/class/watchdog/watchdog0/timeout
> >> is the timeout in seconds).
> > sudo cat /dev/watchdog perfectly rebooted my system. I don't see
> > timeout node, how do I configure the timeout value?
>
> sudo apt-get install watchdog
> man watchdog
>
> should tell you. Alternatively, enable CONFIG_WATCHDOG_SYSFS.
>
> >>
> >>>>
> >>>> Note though that this won't help to debug the problem. A hardware
> >>>> watchdog resets the system. It helps to recover, but it is not intended
> >>>> to help with debugging.
> >>> How do I use the hardware watchdog to reset my system when system is
> >>> frozen? It helps me to collect the crashdump and finally helps me to
> >>> find the root cause for the system frozen issue.
> >>>
> >> There won't be a crashdump. It just hard-resets the system.
> > So is there any other solution to capture the crashdump or trigger
> > soft reboot once kernel is lockedup?
>
> Not that I know of. I suspect, though, that you either have a hard lockup
> where even NMI is non-operational, or NMI doesn't work in your system
> to start with.
>
> If you have nmi_watchdog=1 in your kernel command line, /proc/interrupts
> should show a non-zero number of NMI interrupts. Do you see that in your system ?

Yes, I see non-zero number. When it(NMI interrupt count) supposed to change?

$ cat /proc/interrupts | grep NMI
 NMI:       4129       4153       4192        183   Non-maskable interrupts

$ dmesg | grep NMI
[    0.402175] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[    0.402199] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[    0.402220] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
[    0.402242] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
[    4.636467] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[    4.658289] | NMI testsuite:
[   13.863284] INFO: NMI handler (kgdb_nmi_handler) took too long to
run: 9.744 msecs

Also I enabled pstore\ramoops. While testing the hardware watchdog by
running 'sudo cat /dev/watchdog', I see that console dump updates
between next boot. I see the same behavior consistently.

$ cat /sys/fs/pstore/console-ramoops-0
[  293.462623] printk: console [pstore-1] enabled
[  293.471026] pstore: Registered ramoops as persistent store backend
[  293.477800] ramoops: using 0x100000@0x3ff00000, ecc: 16
[  315.461263] systemd-journald[1665]: Sent WATCHDOG=1 notification.
[  317.447791] watchdog: watchdog0: nowayout prevents watchdog being stopped!
[  317.456616] watchdog: watchdog0: watchdog did not stop!
No errors detected

Now I installed the watchdog daemon and started that service before
the kernel locks up. On triggering few tests kernel locked up and
hardware watchdog triggered the reset, but in this case I don't see
console-ramoops-0 file. Only difference is , this time 'watchdog'
daemon triggered the hardware watchdog. Not sure why console dump not
updated in this scenario?


>
> Guenter



-- 
Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-18 14:38 ` Bjorn Helgaas
  2019-11-18 14:41   ` Bjorn Helgaas
@ 2019-11-18 15:09   ` Muni Sekhar
  2019-11-22 10:59     ` Guenter Roeck
  1 sibling, 1 reply; 14+ messages in thread
From: Muni Sekhar @ 2019-11-18 15:09 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-watchdog, wim, Guenter Roeck

On Mon, Nov 18, 2019 at 8:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [-cc linux-pci (nothing here is PCI-specific)]
>
> On Sat, Nov 16, 2019 at 06:05:05AM +0530, Muni Sekhar wrote:
> > My kernel is built with the following options:
> >
> > $ cat /boot/config-5.0.1 | grep NO_HZ
> > CONFIG_NO_HZ_COMMON=y
> > CONFIG_NO_HZ_IDLE=y
> > # CONFIG_NO_HZ_FULL is not set
> > CONFIG_NO_HZ=y
> > CONFIG_RCU_FAST_NO_HZ=y
> >
> > I booted with watchdog enabled(nmi_watchdog=1) as given below:
> >
> > BOOT_IMAGE=/boot/vmlinuz-5.0.1
> > root=UUID=f65454ae-3f1d-4b9e-b4be-74a29becbe1e ro debug
> > ignore_loglevel console=ttyUSB0,115200 console=tty0 console=tty1
> > console=ttyS2,115200 memmap=1M!1023M nmi_watchdog=1
> > crashkernel=384M-:128M
> >
> > When the system is frozen or the kernel is locked up(I noticed that in
> > this state kernel is not responding for ALT-SysRq-<command key>) but
> > watchdog is not triggered. So I want to understand how to enable the
> > watchdog timer and how to verify the basic watchdog functionality
> > behavior?
>
> I don't know much about the watchdog, but I assume you've found these
> already?
>
>   Documentation/admin-guide/lockup-watchdogs.rst
>   Documentation/admin-guide/sysctl/kernel.rst
>
> Do you have CONFIG_HAVE_NMI_WATCHDOG=y?  (See arch/Kconfig)

I don’t have CONFIG_HAVE_NMI_WATCHDOG in kernel .config file.

$cat /boot/config-5.0.1 | grep CONFIG_HAVE_NMI_WATCHDOG

But tried to enable CONFIG_HAVE_NMI_WATCHDOG via menuconfig, but could
not able to find it. What is the role of CONFIG_HAVE_NMI_WATCHDOG?

Symbol: HAVE_NMI_WATCHDOG [=n]

                                            │
  │ Type  : bool

                                                │
  │   Defined at arch/Kconfig:339

                                                │
  │   Depends on: HAVE_NMI [=y]

                                                │
  │   Selected by [n]:

                                                │
  │   - HAVE_HARDLOCKUP_DETECTOR_ARCH [=n]


  │ Symbol: HAVE_HARDLOCKUP_DETECTOR_ARCH [=n]

                                                │
  │ Type  : bool

                                                │
  │   Defined at arch/Kconfig:346

                                                │
  │   Selects: HAVE_NMI_WATCHDOG [=n]





-- 
Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-18 15:09   ` Muni Sekhar
@ 2019-11-22 10:59     ` Guenter Roeck
  2019-11-22 12:54       ` Muni Sekhar
  0 siblings, 1 reply; 14+ messages in thread
From: Guenter Roeck @ 2019-11-22 10:59 UTC (permalink / raw)
  To: Muni Sekhar, Bjorn Helgaas; +Cc: linux-watchdog, wim

On 11/18/19 7:09 AM, Muni Sekhar wrote:
> On Mon, Nov 18, 2019 at 8:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>
>> [-cc linux-pci (nothing here is PCI-specific)]
>>
>> On Sat, Nov 16, 2019 at 06:05:05AM +0530, Muni Sekhar wrote:
>>> My kernel is built with the following options:
>>>
>>> $ cat /boot/config-5.0.1 | grep NO_HZ
>>> CONFIG_NO_HZ_COMMON=y
>>> CONFIG_NO_HZ_IDLE=y
>>> # CONFIG_NO_HZ_FULL is not set
>>> CONFIG_NO_HZ=y
>>> CONFIG_RCU_FAST_NO_HZ=y
>>>
>>> I booted with watchdog enabled(nmi_watchdog=1) as given below:
>>>
>>> BOOT_IMAGE=/boot/vmlinuz-5.0.1
>>> root=UUID=f65454ae-3f1d-4b9e-b4be-74a29becbe1e ro debug
>>> ignore_loglevel console=ttyUSB0,115200 console=tty0 console=tty1
>>> console=ttyS2,115200 memmap=1M!1023M nmi_watchdog=1
>>> crashkernel=384M-:128M
>>>
>>> When the system is frozen or the kernel is locked up(I noticed that in
>>> this state kernel is not responding for ALT-SysRq-<command key>) but
>>> watchdog is not triggered. So I want to understand how to enable the
>>> watchdog timer and how to verify the basic watchdog functionality
>>> behavior?
>>
>> I don't know much about the watchdog, but I assume you've found these
>> already?
>>
>>    Documentation/admin-guide/lockup-watchdogs.rst
>>    Documentation/admin-guide/sysctl/kernel.rst
>>
>> Do you have CONFIG_HAVE_NMI_WATCHDOG=y?  (See arch/Kconfig)
> 
> I don’t have CONFIG_HAVE_NMI_WATCHDOG in kernel .config file.
> 

That would mean you don't have NMI in the first place. What is your
architecture ?

Guenter

> $cat /boot/config-5.0.1 | grep CONFIG_HAVE_NMI_WATCHDOG
> 
> But tried to enable CONFIG_HAVE_NMI_WATCHDOG via menuconfig, but could
> not able to find it. What is the role of CONFIG_HAVE_NMI_WATCHDOG?
> 
> Symbol: HAVE_NMI_WATCHDOG [=n]
> 
>                                              │
>    │ Type  : bool
> 
>                                                  │
>    │   Defined at arch/Kconfig:339
> 
>                                                  │
>    │   Depends on: HAVE_NMI [=y]
> 
>                                                  │
>    │   Selected by [n]:
> 
>                                                  │
>    │   - HAVE_HARDLOCKUP_DETECTOR_ARCH [=n]
> 
> 
>    │ Symbol: HAVE_HARDLOCKUP_DETECTOR_ARCH [=n]
> 
>                                                  │
>    │ Type  : bool
> 
>                                                  │
>    │   Defined at arch/Kconfig:346
> 
>                                                  │
>    │   Selects: HAVE_NMI_WATCHDOG [=n]
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: watchdog: how to enable?
  2019-11-22 10:59     ` Guenter Roeck
@ 2019-11-22 12:54       ` Muni Sekhar
  0 siblings, 0 replies; 14+ messages in thread
From: Muni Sekhar @ 2019-11-22 12:54 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Bjorn Helgaas, linux-watchdog, wim

On Fri, Nov 22, 2019 at 4:29 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On 11/18/19 7:09 AM, Muni Sekhar wrote:
> > On Mon, Nov 18, 2019 at 8:08 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>
> >> [-cc linux-pci (nothing here is PCI-specific)]
> >>
> >> On Sat, Nov 16, 2019 at 06:05:05AM +0530, Muni Sekhar wrote:
> >>> My kernel is built with the following options:
> >>>
> >>> $ cat /boot/config-5.0.1 | grep NO_HZ
> >>> CONFIG_NO_HZ_COMMON=y
> >>> CONFIG_NO_HZ_IDLE=y
> >>> # CONFIG_NO_HZ_FULL is not set
> >>> CONFIG_NO_HZ=y
> >>> CONFIG_RCU_FAST_NO_HZ=y
> >>>
> >>> I booted with watchdog enabled(nmi_watchdog=1) as given below:
> >>>
> >>> BOOT_IMAGE=/boot/vmlinuz-5.0.1
> >>> root=UUID=f65454ae-3f1d-4b9e-b4be-74a29becbe1e ro debug
> >>> ignore_loglevel console=ttyUSB0,115200 console=tty0 console=tty1
> >>> console=ttyS2,115200 memmap=1M!1023M nmi_watchdog=1
> >>> crashkernel=384M-:128M
> >>>
> >>> When the system is frozen or the kernel is locked up(I noticed that in
> >>> this state kernel is not responding for ALT-SysRq-<command key>) but
> >>> watchdog is not triggered. So I want to understand how to enable the
> >>> watchdog timer and how to verify the basic watchdog functionality
> >>> behavior?
> >>
> >> I don't know much about the watchdog, but I assume you've found these
> >> already?
> >>
> >>    Documentation/admin-guide/lockup-watchdogs.rst
> >>    Documentation/admin-guide/sysctl/kernel.rst
> >>
> >> Do you have CONFIG_HAVE_NMI_WATCHDOG=y?  (See arch/Kconfig)
> >
> > I don’t have CONFIG_HAVE_NMI_WATCHDOG in kernel .config file.
> >
>
> That would mean you don't have NMI in the first place. What is your
> architecture ?

My system has “Intel(R) Atom(TM) CPU  E3845” processor and running
‘uname -m’ gives x86_64.

/proc/interrupts gives the below statistics for NMI:

$ cat /proc/interrupts | grep NMI
 NMI:       4207       4167        125   Non-maskable interrupts


>
> Guenter
>
> > $cat /boot/config-5.0.1 | grep CONFIG_HAVE_NMI_WATCHDOG
> >
> > But tried to enable CONFIG_HAVE_NMI_WATCHDOG via menuconfig, but could
> > not able to find it. What is the role of CONFIG_HAVE_NMI_WATCHDOG?
> >
> > Symbol: HAVE_NMI_WATCHDOG [=n]
> >
> >                                              │
> >    │ Type  : bool
> >
> >                                                  │
> >    │   Defined at arch/Kconfig:339
> >
> >                                                  │
> >    │   Depends on: HAVE_NMI [=y]
> >
> >                                                  │
> >    │   Selected by [n]:
> >
> >                                                  │
> >    │   - HAVE_HARDLOCKUP_DETECTOR_ARCH [=n]
> >
> >
> >    │ Symbol: HAVE_HARDLOCKUP_DETECTOR_ARCH [=n]
> >
> >                                                  │
> >    │ Type  : bool
> >
> >                                                  │
> >    │   Defined at arch/Kconfig:346
> >
> >                                                  │
> >    │   Selects: HAVE_NMI_WATCHDOG [=n]
> >
> >
> >
> >
> >
>


-- 
Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-11-22 12:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-16  0:35 watchdog: how to enable? Muni Sekhar
2019-11-16  1:04 ` Guenter Roeck
2019-11-16  3:03   ` Muni Sekhar
2019-11-16 16:01     ` Guenter Roeck
2019-11-16 18:34       ` Muni Sekhar
2019-11-16 21:42         ` Guenter Roeck
2019-11-18  9:52           ` Muni Sekhar
2019-11-18 14:10             ` Guenter Roeck
2019-11-18 15:07               ` Muni Sekhar
2019-11-18 14:38 ` Bjorn Helgaas
2019-11-18 14:41   ` Bjorn Helgaas
2019-11-18 15:09   ` Muni Sekhar
2019-11-22 10:59     ` Guenter Roeck
2019-11-22 12:54       ` Muni Sekhar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).