All of lore.kernel.org
 help / color / mirror / Atom feed
* NMI watchdog dump does not print on hard lockup
@ 2017-10-12 16:16 Steven Rostedt
  2017-10-12 19:26 ` Peter Zijlstra
  2017-10-13 11:14 ` Petr Mladek
  0 siblings, 2 replies; 10+ messages in thread
From: Steven Rostedt @ 2017-10-12 16:16 UTC (permalink / raw)
  To: LKML
  Cc: Petr Mladek, Sergey Senozhatsky, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar

While doing my presentation for ELC and OSS in Prague in a couple of
weeks, I notice an issue with the printk_safe logic. Then I wrote code
to see if my fears were justified.

I noticed that the NMI printks now depend on an irq_work to trigger to
flush the data out that was written by printks during the NMI. But if
the irq work can't trigger, nothing will get out of the screen.

To test this, I first added this:

	raw_spin_lock(&global_trace.start_lock);
	raw_spin_lock(&global_trace.start_lock);
	raw_spin_unlock(&global_trace.start_lock);
	raw_spin_unlock(&global_trace.start_lock);

To the write function of /sys/kernel/debug/tracing/free_buffer

That way I could trigger a lockup (this case a soft lockup) when I
wanted to.

 # echo 1 > /sys/kernel/debug/tracing/free_buffer

Sure enough, in a minute after doing this, the soft lockup
warning triggered.

Then I changed it to:

	raw_spin_lock_irq(&global_trace.start_lock);
	raw_spin_lock(&global_trace.start_lock);
	raw_spin_unlock(&global_trace.start_lock);
	raw_spin_unlock_irq(&global_trace.start_lock);

And to my surprise, the hard lockup warning triggered. But then I
noticed that the lockup was detected from another CPU. So I changed
this to:

static void lock_up_cpu(void *data)
{
	unsigned long flags;
	raw_spin_lock_irqsave(&global_trace.start_lock, flags);
	raw_spin_lock(&global_trace.start_lock);
	raw_spin_unlock(&global_trace.start_lock);
	raw_spin_unlock_irqrestore(&global_trace.start_lock, flags);
}

[..]

	on_each_cpu(lock_up_cpu, NULL, 1);

This too triggered the warning. But I noticed that the calling function
didn't hard lockup. (Not all CPUs were hard locked).

Finally I did:

	on_each_cpu(lock_up_cpu, NULL, 0);
	lock_up_cpu(tr);

And boom! It locked up (lockdep was enabled, so I could see it showing
the deadlock), but then it stopped there. No output. The NMI watchdog
will only detect hard lockups if there is at least one CPU that is
still active. This could be an issue on non SMP boxes.

We need a way to have NMI flush to consoles when a lockup is detected,
and not depend on an irq_work to do so.

I'll update my presentation to discuss this flaw ;-)

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-12 16:16 NMI watchdog dump does not print on hard lockup Steven Rostedt
@ 2017-10-12 19:26 ` Peter Zijlstra
  2017-10-13 11:14 ` Petr Mladek
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2017-10-12 19:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Petr Mladek, Sergey Senozhatsky, Andrew Morton,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar

On Thu, Oct 12, 2017 at 12:16:58PM -0400, Steven Rostedt wrote:
> We need a way to have NMI flush to consoles when a lockup is detected,
> and not depend on an irq_work to do so.

Why do you think I never use that crap? early_printk FTW ;-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-12 16:16 NMI watchdog dump does not print on hard lockup Steven Rostedt
  2017-10-12 19:26 ` Peter Zijlstra
@ 2017-10-13 11:14 ` Petr Mladek
  2017-10-13 13:18   ` Steven Rostedt
  1 sibling, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2017-10-13 11:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Sergey Senozhatsky, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar

On Thu 2017-10-12 12:16:58, Steven Rostedt wrote:
> static void lock_up_cpu(void *data)
> {
> 	unsigned long flags;
> 	raw_spin_lock_irqsave(&global_trace.start_lock, flags);
> 	raw_spin_lock(&global_trace.start_lock);
> 	raw_spin_unlock(&global_trace.start_lock);
> 	raw_spin_unlock_irqrestore(&global_trace.start_lock, flags);
> }
> 
> [..]
> 
> 	on_each_cpu(lock_up_cpu, NULL, 1);
> 
> This too triggered the warning. But I noticed that the calling function
> didn't hard lockup. (Not all CPUs were hard locked).
> 
> Finally I did:
> 
> 	on_each_cpu(lock_up_cpu, NULL, 0);
> 	lock_up_cpu(tr);
> 
> And boom! It locked up (lockdep was enabled, so I could see it showing
> the deadlock), but then it stopped there. No output. The NMI watchdog
> will only detect hard lockups if there is at least one CPU that is
> still active. This could be an issue on non SMP boxes.
> 
> We need a way to have NMI flush to consoles when a lockup is detected,
> and not depend on an irq_work to do so.


I thought that enabling CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE
could help. panic() flushes the printk_save buffers, see
printk_safe_flush_on_panic(). But it somehow does not help.
I need to dig more into it.

In general, we could either improve detection of situations when
the entire system is locked. It would be a reason to risk calling
consoles even in NMI.

Or we could accept that the "default" printk is not good for all
situations and allow more special "debugging" modes:

	   + Peter's force_early_printk stuff

	   + Allow to disable printk_safe and printk_safe_nmi.
	     There will be a risk of a deadlock caused by printk.
	     But there also will be a chance to see the messages.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-13 11:14 ` Petr Mladek
@ 2017-10-13 13:18   ` Steven Rostedt
  2017-10-13 19:12     ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2017-10-13 13:18 UTC (permalink / raw)
  To: Petr Mladek
  Cc: LKML, Sergey Senozhatsky, Peter Zijlstra, Andrew Morton,
	Linus Torvalds, Thomas Gleixner, Ingo Molnar

On Fri, 13 Oct 2017 13:14:44 +0200
Petr Mladek <pmladek@suse.com> wrote:

> In general, we could either improve detection of situations when
> the entire system is locked. It would be a reason to risk calling
> consoles even in NMI.
> 
> Or we could accept that the "default" printk is not good for all
> situations and allow more special "debugging" modes:
> 
> 	   + Peter's force_early_printk stuff
> 
> 	   + Allow to disable printk_safe and printk_safe_nmi.
> 	     There will be a risk of a deadlock caused by printk.
> 	     But there also will be a chance to see the messages.

Or add the following case: The watchdog triggers, does the print, then
if it triggers again in a certain amount of time, and the print still
hasn't been flushed, the flush happens then.

-- Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-13 13:18   ` Steven Rostedt
@ 2017-10-13 19:12     ` Linus Torvalds
  2017-10-16 11:12       ` Petr Mladek
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2017-10-13 19:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, LKML, Sergey Senozhatsky, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner, Ingo Molnar

On Fri, Oct 13, 2017 at 6:18 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Or add the following case: The watchdog triggers, does the print, then
> if it triggers again in a certain amount of time, and the print still
> hasn't been flushed, the flush happens then.

By the time 40 sec has passed, I suspect most people have just
rebooted the machine.

I think an NMI watchdog should just force the flush - the same way an
oops should. Deadlocks aren't really relevant if something doesn't get
printed out anyway.

               Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-13 19:12     ` Linus Torvalds
@ 2017-10-16 11:12       ` Petr Mladek
  2017-10-16 13:13         ` Sergey Senozhatsky
  0 siblings, 1 reply; 10+ messages in thread
From: Petr Mladek @ 2017-10-16 11:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, LKML, Sergey Senozhatsky, Peter Zijlstra,
	Andrew Morton, Thomas Gleixner, Ingo Molnar

On Fri 2017-10-13 12:12:29, Linus Torvalds wrote:
> On Fri, Oct 13, 2017 at 6:18 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Or add the following case: The watchdog triggers, does the print, then
> > if it triggers again in a certain amount of time, and the print still
> > hasn't been flushed, the flush happens then.

Sounds good to me.

> By the time 40 sec has passed, I suspect most people have just
> rebooted the machine.

This might be the case for a desktop. But people might be more
conservative in case of big servers or when debugging. These might
be desperate to keep going or see something.

> I think an NMI watchdog should just force the flush - the same way an
> oops should. Deadlocks aren't really relevant if something doesn't get
> printed out anyway.

We expicititely flush the NMI buffers in panic() when there is
not other way to see them. But it is questional in other situations.
Sometimes the flush might be the only way to see the messages
and sometimes printk() might unnecessarily cause a deadlock.
IMHO, the only solution is to make it optional.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-16 11:12       ` Petr Mladek
@ 2017-10-16 13:13         ` Sergey Senozhatsky
  2017-10-16 14:15           ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Sergey Senozhatsky @ 2017-10-16 13:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Linus Torvalds, Steven Rostedt, LKML, Sergey Senozhatsky,
	Peter Zijlstra, Andrew Morton, Thomas Gleixner, Ingo Molnar

Hello,

On (10/16/17 13:12), Petr Mladek wrote:
[..]
> > I think an NMI watchdog should just force the flush - the same way an
> > oops should. Deadlocks aren't really relevant if something doesn't get
> > printed out anyway.
> 
> We expicititely flush the NMI buffers in panic() when there is
> not other way to see them. But it is questional in other situations.
> Sometimes the flush might be the only way to see the messages
> and sometimes printk() might unnecessarily cause a deadlock.
> IMHO, the only solution is to make it optional.

just "brainstorming" it... with some silly ideas.

pushing the data from NMI panic might look like we are replacing one
deadlock scenario with another deadlock scenario. some of the console
drivers are soooo complex internally. so I have been thinking about...
may be we can extend struct console and add ->write_on_panic() and that
handler must be as lockless as possible; so lockless that calling it
from anything that is not panic() is a severe bug.

an absolutely trivial case,
if serial console does

	console_write_cb(struct console *co, const char *s, unsigned int count)
	{
		spin_lock_irqsave(&port->lock, flags);
		uart_console_write(s, count, console_putchar);
		spin_unlock_irqrestore(&port->lock, flags);
	}

then panic callback can look like

	console_write_on_panic_cb(struct console *co, const char *s, unsigned int count)
	{
		/* no, we don't take the port lock here */
		uart_console_write(s, count, console_putchar);
	}

a less trivial case might look more involved. but in general that
write_on_panic() callback must do the absolute minimum of work. so
it's sort of a early console, but as part of normal console driver.

I also got some other serial console crazy ideas, but they are not
related to this topic.

	-ss

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-16 13:13         ` Sergey Senozhatsky
@ 2017-10-16 14:15           ` Steven Rostedt
  2017-10-17  7:50             ` Sergey Senozhatsky
  2018-10-23  6:49             ` Sergey Senozhatsky
  0 siblings, 2 replies; 10+ messages in thread
From: Steven Rostedt @ 2017-10-16 14:15 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Petr Mladek, Linus Torvalds, LKML, Peter Zijlstra, Andrew Morton,
	Thomas Gleixner, Ingo Molnar

On Mon, 16 Oct 2017 22:13:05 +0900
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:

> just "brainstorming" it... with some silly ideas.
> 
> pushing the data from NMI panic might look like we are replacing one
> deadlock scenario with another deadlock scenario. some of the console
> drivers are soooo complex internally. so I have been thinking about...
> may be we can extend struct console and add ->write_on_panic() and that
> handler must be as lockless as possible; so lockless that calling it
> from anything that is not panic() is a severe bug.

This may not be a bad idea. And make it so it can't be called unless we
are in panic mode (or at least "oops in progress").

If oops_in_progress is set, and the console has a "write_on_panic"
handler, then just call that.

Heck, if it doesn't have one, and early_printk is defined, then perhaps
that should be the default "write_on_panic" output?

-- Steve


> 
> an absolutely trivial case,
> if serial console does
> 
> 	console_write_cb(struct console *co, const char *s, unsigned int count)
> 	{
> 		spin_lock_irqsave(&port->lock, flags);
> 		uart_console_write(s, count, console_putchar);
> 		spin_unlock_irqrestore(&port->lock, flags);
> 	}
> 
> then panic callback can look like
> 
> 	console_write_on_panic_cb(struct console *co, const char *s, unsigned int count)
> 	{
> 		/* no, we don't take the port lock here */
> 		uart_console_write(s, count, console_putchar);
> 	}
> 
> a less trivial case might look more involved. but in general that
> write_on_panic() callback must do the absolute minimum of work. so
> it's sort of a early console, but as part of normal console driver.
> 
> I also got some other serial console crazy ideas, but they are not
> related to this topic.
> 
> 	-ss

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-16 14:15           ` Steven Rostedt
@ 2017-10-17  7:50             ` Sergey Senozhatsky
  2018-10-23  6:49             ` Sergey Senozhatsky
  1 sibling, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2017-10-17  7:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Linus Torvalds, LKML,
	Peter Zijlstra, Andrew Morton, Thomas Gleixner, Ingo Molnar

On (10/16/17 10:15), Steven Rostedt wrote:
[..]
> > just "brainstorming" it... with some silly ideas.
> > 
> > pushing the data from NMI panic might look like we are replacing one
> > deadlock scenario with another deadlock scenario. some of the console
> > drivers are soooo complex internally. so I have been thinking about...
> > may be we can extend struct console and add ->write_on_panic() and that
> > handler must be as lockless as possible; so lockless that calling it
> > from anything that is not panic() is a severe bug.
> 
> This may not be a bad idea. And make it so it can't be called unless we
> are in panic mode (or at least "oops in progress").

right.

we used to have that zap_locks() function, which used to re-init printk()
internal locks on panic (printk recursion while in panic, to be exact):
logbuf spin_lock and console_sem. I wasn't to fond of this function, it
was missing the fact that on panic every printk() is a direct printk (at
least we have such expectation), IOW, it involves
	console_unlock()->call_console_drivers()

so punching printk()'s locks and leaving console drivers' locks intact
was not fair. at all. so, to improve the situation, I removed zap_locks().
/* kidding */


we have sort of re-entrant printk() now. but not completely re-entrant,
because console drivers are not re-entrant. so we can do

a) add ->zap_locks() callback to console drivers

   each console (which wants to be useful) can re-init its locks there, we
   will call it from panic() only. but, given how complex some of the
   consoles, I'd much rather prefer

b) add ->write_on_panic() callback to console drivers

   and do a barely legal print out there


I don't expect/want/push for/etc every console driver to implement
->write_on_panic() callback, just several most commonly used ones.
basically, the ones that you and PeterZ are using.


we also can split our flush_on_panic() and factor out the most
important part of console_unlock(). the first flush_on_panic(), let's
call it flush_on_panic_immediately() or whatever we name it, can push
messages only to those console drivers that have ->write_on_panic()
enabled. and it must call factored out part of console_unlock(). we
don't want flush_on_panic_immediately() to attempt up() the console
semaphore, because this can deadlock. so that factored out __console_unlock()
won't care about console_sem at all.


the second flush_on_panic() can push the data to all registered and
enabled consoles. this has chances to deadlock, but we can be less
nervous about it [given that there was at least one console with
->write_on_panic()].


> If oops_in_progress is set, and the console has a "write_on_panic"
> handler, then just call that.

yes. I don't like oops_in_progress variable, but some flag is
definitely needed.

> Heck, if it doesn't have one, and early_printk is defined, then perhaps
> that should be the default "write_on_panic" output?

yes, early_printk is a good addition. my systems have
"# CONFIG_EARLY_PRINTK is not set".

	-ss

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: NMI watchdog dump does not print on hard lockup
  2017-10-16 14:15           ` Steven Rostedt
  2017-10-17  7:50             ` Sergey Senozhatsky
@ 2018-10-23  6:49             ` Sergey Senozhatsky
  1 sibling, 0 replies; 10+ messages in thread
From: Sergey Senozhatsky @ 2018-10-23  6:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, Petr Mladek, Linus Torvalds, LKML,
	Peter Zijlstra, Andrew Morton, Thomas Gleixner, Ingo Molnar

On (10/16/17 10:15), Steven Rostedt wrote:
> On Mon, 16 Oct 2017 22:13:05 +0900
> Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> 
> > just "brainstorming" it... with some silly ideas.
> > 
> > pushing the data from NMI panic might look like we are replacing one
> > deadlock scenario with another deadlock scenario. some of the console
> > drivers are soooo complex internally. so I have been thinking about...
> > may be we can extend struct console and add ->write_on_panic() and that
> > handler must be as lockless as possible; so lockless that calling it
> > from anything that is not panic() is a severe bug.
> 
> This may not be a bad idea. And make it so it can't be called unless we
> are in panic mode (or at least "oops in progress").
> 
> If oops_in_progress is set, and the console has a "write_on_panic"
> handler, then just call that.

Good news Steven.

It turned out that some of serial consoles already have this
write_on_panic() mechanism enabled. Such consoles have the following
thing is their usual ->write() callbacks (which we call from printk()):

static void serial_console_write(struct console *co, const char *s,
                                 unsigned count)
{
...
        if (port->sysrq)
                locked = 0;
        else if (oops_in_progress)
                locked = spin_trylock_irqsave(&port->lock, flags);
        else
                spin_lock_irqsave(&port->lock, flags);
...

        uart_console_write(port, s, count, serial_console_putchar);
...
        if (locked)
                spin_unlock_irqrestore(&port->lock, flags);
}

Notice the special handling of port->sysrq and oops_in_progress cases.

So we, basically, already have "lockless on panic" serial consoles.
The problem is - it seems that panic() does not always let lockless
consoles to be lockless. I'm trying to address this in [1].

[1] lkml.kernel.org/r/20181016050428.17966-2-sergey.senozhatsky@gmail.com

	-ss

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-23  6:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-12 16:16 NMI watchdog dump does not print on hard lockup Steven Rostedt
2017-10-12 19:26 ` Peter Zijlstra
2017-10-13 11:14 ` Petr Mladek
2017-10-13 13:18   ` Steven Rostedt
2017-10-13 19:12     ` Linus Torvalds
2017-10-16 11:12       ` Petr Mladek
2017-10-16 13:13         ` Sergey Senozhatsky
2017-10-16 14:15           ` Steven Rostedt
2017-10-17  7:50             ` Sergey Senozhatsky
2018-10-23  6:49             ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.