All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] watchdog/softlockup: Make softlockup reports more reliable and useful
@ 2019-08-19 10:47 Petr Mladek
  2019-08-19 10:47 ` [PATCH 1/3] watchdog/softlockup: Preserve original timestamp when touching watchdog externally Petr Mladek
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Petr Mladek @ 2019-08-19 10:47 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra
  Cc: Laurence Oberman, Vincent Whitchurch, Michal Hocko, linux-kernel,
	Petr Mladek

( Resending this as a proper patch with updated commit messages.
  The original was
  https://lkml.kernel.org/r/20190605140954.28471-1-pmladek@suse.com )

We were analyzing logs with several softlockup reports in flush_tlb_kernel_range().
They were confusing. Especially it was not clear whether it was deadlock,
livelock, or separate softlockups.

It went out that even a simple busy loop:

	while (true)
	      cpu_relax();

is able to produce several softlockups reports:

  [  168.277520] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865]
  [  196.277604] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865]
  [  236.277522] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [cat:4865]
                                                              ^^^^

This patchset fixes the problem in two steps:

+ 1st patch prevents restart of the watchdog from unrelated locations.
  Each softlockup is reported only once:

  [  320.248948] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [cat:4916]


+ 2nd patch helps to distinguish several possible situations by
  regular reports. The report looks like:

  [  480.372418] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [cat:4943]
  [  508.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 52s! [cat:4943]
  [  548.372359] watchdog: BUG: soft lockup - CPU#2 stuck for 89s! [cat:4943]
  [  576.372351] watchdog: BUG: soft lockup - CPU#2 stuck for 115s! [cat:4943]
                                                              ^^^^^

3rd patch provides a sample code to trigger a softlockup.


Petr Mladek (3):
  watchdog/softlockup: Preserve original timestamp when touching
    watchdog externally
  watchdog/softlockup: Report the same softlockup regularly
  Test softlockup

 fs/proc/consoles.c |  5 ++++
 fs/proc/version.c  |  7 +++++
 kernel/watchdog.c  | 85 +++++++++++++++++++++++++++++++-----------------------
 3 files changed, 61 insertions(+), 36 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 13+ messages in thread
* [RFC 0/3] watchdog/softlockup: Make softlockup reports more reliable and useful
@ 2019-06-05 14:09 Petr Mladek
  2019-06-05 14:09 ` [PATCH 3/3] Test softlockup Petr Mladek
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Mladek @ 2019-06-05 14:09 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra
  Cc: Laurence Oberman, Vincent Whitchurch, Michal Hocko, linux-kernel,
	Petr Mladek

Hi,

we were analyzing logs with several softlockup reports in flush_tlb_kernel_range().
They were confusing. Especially it was not clear whether it was deadlock,
livelock, or separate softlockups.

It went out that even a simple busy loop:

	while (true)
	      cpu_relax();

is able to produce several softlockups reports:

[  168.277520] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865]
[  196.277604] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [cat:4865]
[  236.277522] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [cat:4865]


I tried to understand the tricky watchdog code and produced two patches
that would be helpful to debug the original real bug:

   1st patch prevents restart of the watchdog from unrelated locations.

   2nd patch helps to distinguish several possible situations by
   regular reports.

   3rd patch can be used for testing the problem.


The watchdog code might deserve even more clean up. Anyway, I would
like to hear other's opinion first.


Petr Mladek (3):
  watchdog/softlockup: Preserve original timestamp when touching
    watchdog externally
  watchdog/softlockup: Report the same softlockup regularly
  Test softlockup

 fs/proc/consoles.c |  5 ++++
 fs/proc/version.c  |  7 +++++
 kernel/watchdog.c  | 85 +++++++++++++++++++++++++++++++-----------------------
 3 files changed, 61 insertions(+), 36 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-21 14:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-19 10:47 [PATCH 0/3] watchdog/softlockup: Make softlockup reports more reliable and useful Petr Mladek
2019-08-19 10:47 ` [PATCH 1/3] watchdog/softlockup: Preserve original timestamp when touching watchdog externally Petr Mladek
2019-10-21 12:42   ` Peter Zijlstra
2019-10-21 13:04     ` Petr Mladek
2019-08-19 10:47 ` [PATCH 2/3] watchdog/softlockup: Report the same softlockup regularly Petr Mladek
2019-10-21 12:43   ` Peter Zijlstra
2019-10-21 13:40     ` Petr Mladek
2019-10-21 14:09       ` Peter Zijlstra
2019-08-19 10:47 ` [PATCH 3/3] Test softlockup Petr Mladek
2019-10-21 12:45   ` Peter Zijlstra
2019-10-21 13:06     ` Petr Mladek
2019-10-21 12:32 ` [PATCH 0/3] watchdog/softlockup: Make softlockup reports more reliable and useful Petr Mladek
  -- strict thread matches above, loose matches on Subject: below --
2019-06-05 14:09 [RFC " Petr Mladek
2019-06-05 14:09 ` [PATCH 3/3] Test softlockup Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.