linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] printk: Ratelimit messages printed by console drivers
@ 2018-04-13 12:47 Petr Mladek
  2018-04-13 14:12 ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Mladek @ 2018-04-13 12:47 UTC (permalink / raw)
  To: Steven Rostedt, Sergey Senozhatsky
  Cc: akpm, linux-mm, Peter Zijlstra, Jan Kara, Tetsuo Handa,
	Sergey Senozhatsky, Tejun Heo, linux-kernel, Petr Mladek

Error messages printed by console drivers might cause an infinite loop.
In particular, writing a message might produce another message that
need to be written, etc.

The obvious solution is to remove these messages. But there many
non-trivial console drivers. Also showing printk() messages is not
the only task of these drivers. Finally, the messages might come
from the timer, allocator, locking, or any other generic code.
As a result it is hard to catch all the problems. Not to say
that it is hard to be aware of problems and debug them without
the messages.

This patch rate-limits messages printed by console drivers from
console_unlock(). The decision is done in vprintk_func() that already
modifies printk() behavior according to the context. It uses the existing
console_owner variable to detect the context where console drivers are
called.

The burst limit is set to 100 lines so that it allows to see WARN(),
lockdep or other similar messages. The exact number is inspired by
printk_limits used by btrfs_printk().

The interval is set to one hour. It is rather arbitrary selected time.
It is supposed to be a compromise between never print these messages,
do not lockup the machine, do not fill the entire buffer too quickly,
and get information if something changes over time.

The important thing is to break a potential infinite loop. Both printk
and consoles need to get calm. This is not that easy because the messages
printed by console drivers are stored into a printk_safe buffers. They
are flushed later from IRQ context using printk_deferred(). It means
that they are flushed to the console even later by another IRQ.
Therefore there might be ping-pong scenario where flushing printk_safe
buffers wakes consoles that would fill the buffer again, etc.

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/internal.h    | 1 +
 kernel/printk/printk.c      | 2 +-
 kernel/printk/printk_safe.c | 7 +++++++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 2a7d04049af4..1633ccbd774c 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -23,6 +23,7 @@
 #define PRINTK_NMI_CONTEXT_MASK		 0x80000000
 
 extern raw_spinlock_t logbuf_lock;
+extern struct task_struct *console_owner;
 
 __printf(1, 0) int vprintk_default(const char *fmt, va_list args);
 __printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2f4af216bd6e..26f45c03d245 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1559,7 +1559,7 @@ static struct lockdep_map console_owner_dep_map = {
 #endif
 
 static DEFINE_RAW_SPINLOCK(console_owner_lock);
-static struct task_struct *console_owner;
+struct task_struct *console_owner;
 static bool console_waiter;
 
 /**
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c2004bb23..201913ae9c40 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -21,6 +21,7 @@
 #include <linux/smp.h>
 #include <linux/cpumask.h>
 #include <linux/irq_work.h>
+#include <linux/ratelimit.h>
 #include <linux/printk.h>
 
 #include "internal.h"
@@ -360,10 +361,16 @@ void __printk_safe_exit(void)
 
 __printf(1, 0) int vprintk_func(const char *fmt, va_list args)
 {
+	static DEFINE_RATELIMIT_STATE(ratelimit_console, 60 * 60 * HZ, 100);
+
 	/* Use extra buffer in NMI when logbuf_lock is taken or in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
 		return vprintk_nmi(fmt, args);
 
+	/* Prevent infinite loop caused by messages from console drivers. */
+	if (console_owner == current && !__ratelimit(&ratelimit_console))
+		return 0;
+
 	/* Use extra buffer to prevent a recursion deadlock in safe mode. */
 	if (this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK)
 		return vprintk_safe(fmt, args);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-13 12:47 [PATCH] printk: Ratelimit messages printed by console drivers Petr Mladek
@ 2018-04-13 14:12 ` Steven Rostedt
  2018-04-14  2:35   ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2018-04-13 14:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Sergey Senozhatsky, Tejun Heo, linux-kernel

On Fri, 13 Apr 2018 14:47:04 +0200
Petr Mladek <pmladek@suse.com> wrote:


> The interval is set to one hour. It is rather arbitrary selected time.
> It is supposed to be a compromise between never print these messages,
> do not lockup the machine, do not fill the entire buffer too quickly,
> and get information if something changes over time.


I think an hour is incredibly long. We only allow 100 lines per hour for
printks happening inside another printk?

I think 5 minutes (at most) would probably be plenty. One minute may be
good enough.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-13 14:12 ` Steven Rostedt
@ 2018-04-14  2:35   ` Sergey Senozhatsky
  2018-04-16  1:47     ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2018-04-14  2:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Petr Mladek, Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra,
	Jan Kara, Tetsuo Handa, Sergey Senozhatsky, Tejun Heo,
	linux-kernel

On (04/13/18 10:12), Steven Rostedt wrote:
> 
> > The interval is set to one hour. It is rather arbitrary selected time.
> > It is supposed to be a compromise between never print these messages,
> > do not lockup the machine, do not fill the entire buffer too quickly,
> > and get information if something changes over time.
> 
> 
> I think an hour is incredibly long. We only allow 100 lines per hour for
> printks happening inside another printk?
> 
> I think 5 minutes (at most) would probably be plenty. One minute may be
> good enough.

Besides 100 lines is absolutely not enough for any real lockdep splat.
My call would be - up to 1000 lines in a 1 minute interval.

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-14  2:35   ` Sergey Senozhatsky
@ 2018-04-16  1:47     ` Sergey Senozhatsky
  2018-04-16  4:25       ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2018-04-16  1:47 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: akpm, linux-mm, Peter Zijlstra, Jan Kara, Tetsuo Handa,
	Sergey Senozhatsky, Tejun Heo, linux-kernel, Sergey Senozhatsky

On (04/14/18 11:35), Sergey Senozhatsky wrote:
> On (04/13/18 10:12), Steven Rostedt wrote:
> > 
> > > The interval is set to one hour. It is rather arbitrary selected time.
> > > It is supposed to be a compromise between never print these messages,
> > > do not lockup the machine, do not fill the entire buffer too quickly,
> > > and get information if something changes over time.
> > 
> > 
> > I think an hour is incredibly long. We only allow 100 lines per hour for
> > printks happening inside another printk?
> > 
> > I think 5 minutes (at most) would probably be plenty. One minute may be
> > good enough.
> 
> Besides 100 lines is absolutely not enough for any real lockdep splat.
> My call would be - up to 1000 lines in a 1 minute interval.

Well, if we want to basically turn printk_safe() into printk_safe_ratelimited().
I'm not so sure about it.

Besides the patch also rate limits printk_nmi->logbuf - the logbuf
PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important
to rate limit it - for no reason.

Dunno, can we keep printk_safe() the way it is and introduce a new
printk_safe_ratelimited() specifically for call_console_drivers()?

Lockdep splat is a one time event, if we lose half of it - we, most
like, lose the entire report. And call_console_drivers() is not the
one and only source of warnings/errors/etc. So if we turn printk_safe
into printk_safe_ratelimited() [not sure we want to do it] for all
then I want restrictions to be as low as possible, IOW to log_store()
as many lines as possible.

Chatty console drivers is not exactly the case which printk_safe() was
meant to fix. I'm pretty sure I put call_console_drivers() under printk_safe
just because we call console_drivers with local IRQs disabled anyway and I
was too lazy to do something like this

---

@@ -2377,6 +2377,7 @@ void console_unlock(void)
 		console_idx = log_next(console_idx);
 		console_seq++;
 		raw_spin_unlock(&logbuf_lock);
+		__printk_safe_exit();
 
 		/*
 		 * While actively printing out messages, if another printk()
@@ -2390,6 +2391,7 @@ void console_unlock(void)
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
+		__printk_safe_enter();
 		if (console_lock_spinning_disable_and_check()) {
 			printk_safe_exit_irqrestore(flags);
 			return;

---

But, in general, I don't think there are real reasons for us to call
console drivers from printk_safe section: call_console_drivers()->printk()
will not deadlock, because vprintk_emit()->console_trylock_spinning() will
always fail.

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-16  1:47     ` Sergey Senozhatsky
@ 2018-04-16  4:25       ` Sergey Senozhatsky
  2018-04-19 12:53         ` Petr Mladek
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2018-04-16  4:25 UTC (permalink / raw)
  To: Steven Rostedt, Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On (04/16/18 10:47), Sergey Senozhatsky wrote:
> On (04/14/18 11:35), Sergey Senozhatsky wrote:
> > On (04/13/18 10:12), Steven Rostedt wrote:
> > > 
> > > > The interval is set to one hour. It is rather arbitrary selected time.
> > > > It is supposed to be a compromise between never print these messages,
> > > > do not lockup the machine, do not fill the entire buffer too quickly,
> > > > and get information if something changes over time.
> > > 
> > > 
> > > I think an hour is incredibly long. We only allow 100 lines per hour for
> > > printks happening inside another printk?
> > > 
> > > I think 5 minutes (at most) would probably be plenty. One minute may be
> > > good enough.
> > 
> > Besides 100 lines is absolutely not enough for any real lockdep splat.
> > My call would be - up to 1000 lines in a 1 minute interval.
> 
> Well, if we want to basically turn printk_safe() into printk_safe_ratelimited().
> I'm not so sure about it.
> 
> Besides the patch also rate limits printk_nmi->logbuf - the logbuf
> PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important
> to rate limit it - for no reason.
> 
> Dunno, can we keep printk_safe() the way it is and introduce a new
> printk_safe_ratelimited() specifically for call_console_drivers()?
> 
> Lockdep splat is a one time event, if we lose half of it - we, most
> like, lose the entire report. And call_console_drivers() is not the
> one and only source of warnings/errors/etc. So if we turn printk_safe
> into printk_safe_ratelimited() [not sure we want to do it] for all
> then I want restrictions to be as low as possible, IOW to log_store()
> as many lines as possible.

One more thing,
I'd really prefer to rate limit the function which flushes per-CPU
printk_safe buffers; not the function that appends new messages to
the per-CPU printk_safe buffers. There is a significant difference.

printk_safe does not help us when we are dealing with any external
locks - and call_console_drivers() is precisely that type of case.
The very next thing to happen after lockdep splat, or spin_lock
debugging report, etc. can be an actual deadlock->panic(). Thus I
want to have the entire report in per-CPU buffer [if possible],
so we can flush_on_panic() per-CPU buffers, or at least move the
data to the logbuf and make it accessible in vmcore. If we rate
limit the function that appends data to the per-CPU buffer then we
may simply suppress [rate limit] the report, so there will be
nothing to flush_on_panic().

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-16  4:25       ` Sergey Senozhatsky
@ 2018-04-19 12:53         ` Petr Mladek
  2018-04-20  2:15           ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Mladek @ 2018-04-19 12:53 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Mon 2018-04-16 13:25:53, Sergey Senozhatsky wrote:
> On (04/16/18 10:47), Sergey Senozhatsky wrote:
> > On (04/14/18 11:35), Sergey Senozhatsky wrote:
> > > On (04/13/18 10:12), Steven Rostedt wrote:
> > > > 
> > > > > The interval is set to one hour. It is rather arbitrary selected time.
> > > > > It is supposed to be a compromise between never print these messages,
> > > > > do not lockup the machine, do not fill the entire buffer too quickly,
> > > > > and get information if something changes over time.
> > > > 
> > > > 
> > > > I think an hour is incredibly long. We only allow 100 lines per hour for
> > > > printks happening inside another printk?
> > > > 
> > > > I think 5 minutes (at most) would probably be plenty. One minute may be
> > > > good enough.
> > > 
> > > Besides 100 lines is absolutely not enough for any real lockdep splat.
> > > My call would be - up to 1000 lines in a 1 minute interval.

But this would break the intention of this patch. We need to flush all
messages to the console before the timeout. Otherwise we would never
break the possible infinite loop.

Come on guys! The first reaction how to fix the infinite loop was
to fix the console drivers and remove the recursive messages. We are
talking about messages that should not be there or they should
get replaced by WARN_ONCE(), print_once() or so. This patch only
give us a chance to see the problem and do not blow up immediately.

I am fine with increasing the number of lines. But we need to keep
the timeout long. In fact, 1 hour is still rather short from my POV.


> > Well, if we want to basically turn printk_safe() into printk_safe_ratelimited().
> > I'm not so sure about it.

No, it is not about printk_safe(). The ratelimit is active when
console_owner == current. It triggers when printk() is called
inside

	console_lock_spinning_enable();

	  call_console_drivers(ext_text, ext_len, text, len);
	    printk();

	console_lock_spinning_disable_and_check()

It will continue working even if we disable printk_safe() context
earlier and the messages are stored into the main log buffer.


> > Besides the patch also rate limits printk_nmi->logbuf - the logbuf
> > PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important
> > to rate limit it - for no reason.

Again. It has the effect only when console_owner == current. It means
that it affects "only" NMIs that interrupt console_unlock() when calling
console drivers.

Anyway, it needs to get fixed. I suggest to update the check in
printk_func():

	if (console_owner == current && !in_nmi() &&
	    !__ratelimit(&ratelimit_console))
		return 0;


> One more thing,
> I'd really prefer to rate limit the function which flushes per-CPU
> printk_safe buffers; not the function that appends new messages to
> the per-CPU printk_safe buffers.

I wonder if this opinion is still valid after explaining the
dependency on printk_safe(). In each case, it sounds weird
to block printk_safe buffers with some "unwanted" messages.
Or maybe I miss something.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-19 12:53         ` Petr Mladek
@ 2018-04-20  2:15           ` Sergey Senozhatsky
  2018-04-20  9:12             ` Petr Mladek
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2018-04-20  2:15 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, akpm, linux-mm,
	Peter Zijlstra, Jan Kara, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (04/19/18 14:53), Petr Mladek wrote:
> > > > 
> > > > Besides 100 lines is absolutely not enough for any real lockdep splat.
> > > > My call would be - up to 1000 lines in a 1 minute interval.
> 
> But this would break the intention of this patch.

You picked an arbitrary value and now you are saying that any other
value will not work?

> Come on guys! The first reaction how to fix the infinite loop was
> to fix the console drivers and remove the recursive messages. We are
> talking about messages that should not be there or they should
> get replaced by WARN_ONCE(), print_once() or so. This patch only
> give us a chance to see the problem and do not blow up immediately.
> 
> I am fine with increasing the number of lines. But we need to keep
> the timeout long. In fact, 1 hour is still rather short from my POV.

Disagree.

I saw 3 or 4 lockdep reports coming from console drivers. "100 lines"
is way too restrictive. I want to have a complete report; not the first
50 lines, not the first 103 lines, which would "hint" me that "hey, there
is something wrong there, but you are on your own to figure out the rest".

> > > Well, if we want to basically turn printk_safe() into printk_safe_ratelimited().
> > > I'm not so sure about it.
> 
> No, it is not about printk_safe(). The ratelimit is active when
> console_owner == current. It triggers when printk() is called
> inside

"console_owner == current" is exactly the point when we call console
drivers and add scheduler, networking, timekeeping, etc. locks to the
picture. And so far all of the lockdeps reports that we had were from
call_console_drivers(). So it very much is about printk_safe().

> > > Besides the patch also rate limits printk_nmi->logbuf - the logbuf
> > > PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important
> > > to rate limit it - for no reason.
> 
> Again. It has the effect only when console_owner == current. It means
> that it affects "only" NMIs that interrupt console_unlock() when calling
> console drivers.

What is your objection here? NMIs can come anytime.

> > One more thing,
> > I'd really prefer to rate limit the function which flushes per-CPU
> > printk_safe buffers; not the function that appends new messages to
> > the per-CPU printk_safe buffers.
> 
> I wonder if this opinion is still valid after explaining the
> dependency on printk_safe(). In each case, it sounds weird
> to block printk_safe buffers with some "unwanted" messages.
> Or maybe I miss something.

I'm not following.

The fact that some consoles under some circumstances can add unwanted
messages to the buffer does not look like a good enough reason to start
rate limiting _all_ messages and to potentially discard the _important_
ones.

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20  2:15           ` Sergey Senozhatsky
@ 2018-04-20  9:12             ` Petr Mladek
  2018-04-20 12:04               ` Steven Rostedt
  2018-04-23  5:21               ` Sergey Senozhatsky
  0 siblings, 2 replies; 27+ messages in thread
From: Petr Mladek @ 2018-04-20  9:12 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri 2018-04-20 11:15:11, Sergey Senozhatsky wrote:
> On (04/19/18 14:53), Petr Mladek wrote:
> > > > > 
> > > > > Besides 100 lines is absolutely not enough for any real lockdep splat.
> > > > > My call would be - up to 1000 lines in a 1 minute interval.
> > 
> > But this would break the intention of this patch.
> 
> You picked an arbitrary value and now you are saying that any other
> value will not work?

Yes, my number was arbitrary. The important thing is that it was long
enough. Or do you know about an console that will not be able to write
100 lines within one hour?

On the other hand. Let's take a classic 9600 baud console
1000 lines 80 characters long. If I count correctly,
the console would need:

      80 * 1000 * 8 / 9600 = 66.6666666 seconds

You might argue that average lines are below 80 characters.
But there eveidently is a non-trivial risk that 1000 lines
per minute ratelimiting would not help.


> > Come on guys! The first reaction how to fix the infinite loop was
> > to fix the console drivers and remove the recursive messages. We are
> > talking about messages that should not be there or they should
> > get replaced by WARN_ONCE(), print_once() or so. This patch only
> > give us a chance to see the problem and do not blow up immediately.
> > 
> > I am fine with increasing the number of lines. But we need to keep
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > the timeout long. In fact, 1 hour is still rather short from my POV.
> 
> Disagree.
> 
> I saw 3 or 4 lockdep reports coming from console drivers. "100 lines"
> is way too restrictive.

As I already wrote. I am fine with increasing the number of lines.
Will 1000 lines within 1 hour be enough for you, please?


> > > > Well, if we want to basically turn printk_safe() into printk_safe_ratelimited().
> > > > I'm not so sure about it.
> > 
> > No, it is not about printk_safe(). The ratelimit is active when
> > console_owner == current. It triggers when printk() is called
> > inside
> 
> "console_owner == current" is exactly the point when we call console
> drivers and add scheduler, networking, timekeeping, etc. locks to the
> picture. And so far all of the lockdeps reports that we had were from
> call_console_drivers(). So it very much is about printk_safe().

I am lost. In the mail
https://lkml.kernel.org/r/20180416014729.GB1034@jagdpanzerIV
you wrote:

<paste>
Chatty console drivers is not exactly the case which printk_safe() was
meant to fix. I'm pretty sure I put call_console_drivers() under printk_safe
just because we call console_drivers with local IRQs disabled anyway and I
was too lazy to do something like this
</paste>

My understanding of the older mail is that you called
console_drivers() in printk_safe() context only because it was
easier to disable printk_safe context later together with
enabling irqs.

My understanding of today's mail is that it is important
to call console drivers in printk_safe() context.

It is a contradiction. Could you please explain?


> > > > Besides the patch also rate limits printk_nmi->logbuf - the logbuf
> > > > PRINTK_NMI_DEFERRED_CONTEXT_MASK bypass, which is way too important
> > > > to rate limit it - for no reason.
> > 
> > Again. It has the effect only when console_owner == current. It means
> > that it affects "only" NMIs that interrupt console_unlock() when calling
> > console drivers.
> 
> What is your objection here? NMIs can come anytime.

Why do you completely ignore that I put "only" into quotation marks?
Why did you comment only the first paragraph and removed the
following paragraph from my reply?:

<paste>
Anyway, it needs to get fixed. I suggest to update the check in
printk_func():

	if (console_owner == current && !in_nmi() &&
	    !__ratelimit(&ratelimit_console))
		return 0;

</paste>

What is you real intention, please? Do you just want to show me as
an idiot or solve the problem? Is this some politics game?


> > > One more thing,
> > > I'd really prefer to rate limit the function which flushes per-CPU
> > > printk_safe buffers; not the function that appends new messages to
> > > the per-CPU printk_safe buffers.
> > 
> > I wonder if this opinion is still valid after explaining the
> > dependency on printk_safe(). In each case, it sounds weird
> > to block printk_safe buffers with some "unwanted" messages.
> > Or maybe I miss something.
> 
> I'm not following.
> 
> The fact that some consoles under some circumstances can add unwanted
> messages to the buffer does not look like a good enough reason to start
> rate limiting _all_ messages and to potentially discard the _important_
> ones.

Could you please read the original patch again? The ratelimiting
happens only when console_owner == current. This will be true
only if you print a message from the small context of
console_unlock() where console drivers are called?

What do you mean by _all_ messages, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20  9:12             ` Petr Mladek
@ 2018-04-20 12:04               ` Steven Rostedt
  2018-04-20 14:01                 ` Petr Mladek
  2018-04-23  5:21               ` Sergey Senozhatsky
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2018-04-20 12:04 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri, 20 Apr 2018 11:12:24 +0200
Petr Mladek <pmladek@suse.com> wrote:

> Yes, my number was arbitrary. The important thing is that it was long
> enough. Or do you know about an console that will not be able to write
> 100 lines within one hour?

The problem is the way rate limit works. If you print 100 lines (or
1000) in 5 seconds, then you just stopped printing from that context
for 59 minutes and 55 seconds. That's a long time to block printing.

What happens if you had a couple of NMIs go off that takes up that
time, and then you hit a bug 10 minutes later from that context. You
just lost it.

This is a magnitude larger than any other user of rate limit in the
kernel. The most common time is 5 seconds. The longest I can find is 1
minute. You are saying you want to block printing from this context for
60 minutes!

That is HUGE! I don't understand your rational for such a huge number.
What data do you have to back that up with?

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20 12:04               ` Steven Rostedt
@ 2018-04-20 14:01                 ` Petr Mladek
  2018-04-20 14:17                   ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Mladek @ 2018-04-20 14:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri 2018-04-20 08:04:28, Steven Rostedt wrote:
> On Fri, 20 Apr 2018 11:12:24 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > Yes, my number was arbitrary. The important thing is that it was long
> > enough. Or do you know about an console that will not be able to write
> > 100 lines within one hour?
> 
> The problem is the way rate limit works. If you print 100 lines (or
> 1000) in 5 seconds, then you just stopped printing from that context
> for 59 minutes and 55 seconds. That's a long time to block printing.

Are we talking about the same context?

I am talking about console drivers called from console_unlock(). It is
very special context because it is more or less recursive:

     + could cause infinite loop
     + the errors are usually the same again and again

As a result, if you get too many messages from this context:

     + you are lost (recursion)
     + more messages != new information

And you need to fix the problem anyway. Otherwise, the system
logging is a mess.


> What happens if you had a couple of NMIs go off that takes up that
> time, and then you hit a bug 10 minutes later from that context. You
> just lost it.

I do not understand how this is related to the NMI context.
The messages in NMI context are not throttled!

OK, the original patch throttled also NMI messages when NMI
interrupted console drivers. But it is easy to fix.


> This is a magnitude larger than any other user of rate limit in the
> kernel. The most common time is 5 seconds. The longest I can find is 1
> minute. You are saying you want to block printing from this context for
> 60 minutes!

I see 1 day long limits in dio_warn_stale_pagecache() and
xfs_scrub_experimental_warning().

Note that most ratelimiting is related to a single message. Also it
is in situation where the system should recover within seconds.


> That is HUGE! I don't understand your rational for such a huge number.
> What data do you have to back that up with?

We want to allow seeing the entire lockdep splat (Sergey wants more
than 100 lines). Also it is not that unusual that slow console is busy
several minutes when too many things are happening.

I proposed that long delay because I want to be on the safe side.
Also I do not see a huge benefit in repeating the same messages
too often.

Alternative solution would be to allow first, lets say 250, lines
and then nothing. I mean to change the approach from rate-limiting
to print-once.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20 14:01                 ` Petr Mladek
@ 2018-04-20 14:17                   ` Steven Rostedt
  2018-04-20 14:19                     ` Steven Rostedt
  2018-04-20 14:57                     ` Petr Mladek
  0 siblings, 2 replies; 27+ messages in thread
From: Steven Rostedt @ 2018-04-20 14:17 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri, 20 Apr 2018 16:01:57 +0200
Petr Mladek <pmladek@suse.com> wrote:

> On Fri 2018-04-20 08:04:28, Steven Rostedt wrote:
> > On Fri, 20 Apr 2018 11:12:24 +0200
> > Petr Mladek <pmladek@suse.com> wrote:
> >   
> > > Yes, my number was arbitrary. The important thing is that it was long
> > > enough. Or do you know about an console that will not be able to write
> > > 100 lines within one hour?  
> > 
> > The problem is the way rate limit works. If you print 100 lines (or
> > 1000) in 5 seconds, then you just stopped printing from that context
> > for 59 minutes and 55 seconds. That's a long time to block printing.  
> 
> Are we talking about the same context?
> 
> I am talking about console drivers called from console_unlock(). It is
> very special context because it is more or less recursive:
> 
>      + could cause infinite loop
>      + the errors are usually the same again and again

The check is only when console_owner == current, which can easily
happen with an interrupt let alone an NMI.

The common case is not recursive.

> 
> As a result, if you get too many messages from this context:
> 
>      + you are lost (recursion)
>      + more messages != new information
> 
> And you need to fix the problem anyway. Otherwise, the system
> logging is a mess.
> 
> 
> > What happens if you had a couple of NMIs go off that takes up that
> > time, and then you hit a bug 10 minutes later from that context. You
> > just lost it.  
> 
> I do not understand how this is related to the NMI context.
> The messages in NMI context are not throttled!
> 
> OK, the original patch throttled also NMI messages when NMI
> interrupted console drivers. But it is easy to fix.

My mistake in just mentioning NMIs, because the check is on
console_owner which can be set with interrupts enabled. That means an
interrupt that does a print could hide printks from other interrupts or
NMIs when console_owner is set.

> 
> 
> > This is a magnitude larger than any other user of rate limit in the
> > kernel. The most common time is 5 seconds. The longest I can find is 1
> > minute. You are saying you want to block printing from this context for
> > 60 minutes!  
> 
> I see 1 day long limits in dio_warn_stale_pagecache() and
> xfs_scrub_experimental_warning().
> 
> Note that most ratelimiting is related to a single message. Also it
> is in situation where the system should recover within seconds.
> 
> 
> > That is HUGE! I don't understand your rational for such a huge number.
> > What data do you have to back that up with?  
> 
> We want to allow seeing the entire lockdep splat (Sergey wants more
> than 100 lines). Also it is not that unusual that slow console is busy
> several minutes when too many things are happening.
> 
> I proposed that long delay because I want to be on the safe side.
> Also I do not see a huge benefit in repeating the same messages
> too often.
> 
> Alternative solution would be to allow first, lets say 250, lines
> and then nothing. I mean to change the approach from rate-limiting
> to print-once.


Actually, I think we are fine with the one hour and 1000 prints if we
add to the condition. It can't just check console_owner. We need a way
to know that this is indeed a recursion. Perhaps we should set the
context we are in when setting console owner. Something like I have in
the ring buffer code.

enum {
	CONTEXT_NONE,
	CONTEXT_NMI,
	CONTEXT_IRQ,
	CONTEXT_SOFTIRQ,
	CONTEXT_NORMAL
};

int git_context(void)
{
	unsigned long pc = preempt_count();

	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
		return CONTEXT_NORMAL;
	else
		return pc & NMI_MASK ? CONTEXT_NMI :
			pc & HARDIRQ_MASK ? CONTEXT_IRQ : CONTEXT_SOFTIRQ;
}

static void console_lock_spinning_enable(void)
{
	raw_spin_lock(&console_owner_lock);
	console_owner = current;
	console_context = get_context();
	raw_spin_unlock(&console_owner_lock);
[..]


static int console_lock_spinning_disable_and_check(void)
{
	raw_spin_lock(&console_owner_lock);
	waiter = READ_ONCE(console_waiter);
	console_owner = NULL;
	console_context = CONTEXT_NONE;
	raw_spin_unlock(&console_owner_lock);
[..]


Then have your check be:

+	/* Prevent infinite loop caused by messages from console drivers. */
+	if (console_owner == current && console_context == get_context() &&
+	    !__ratelimit(&ratelimit_console))
+		return 0;

Then you know that this is definitely due to recursion.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20 14:17                   ` Steven Rostedt
@ 2018-04-20 14:19                     ` Steven Rostedt
  2018-04-20 14:57                     ` Petr Mladek
  1 sibling, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2018-04-20 14:19 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri, 20 Apr 2018 10:17:51 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> int git_context(void)

That should have been get_context(void) ;-)

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20 14:17                   ` Steven Rostedt
  2018-04-20 14:19                     ` Steven Rostedt
@ 2018-04-20 14:57                     ` Petr Mladek
  2018-04-20 15:13                       ` Steven Rostedt
  1 sibling, 1 reply; 27+ messages in thread
From: Petr Mladek @ 2018-04-20 14:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri 2018-04-20 10:17:51, Steven Rostedt wrote:
> On Fri, 20 Apr 2018 16:01:57 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> > On Fri 2018-04-20 08:04:28, Steven Rostedt wrote:
> > > The problem is the way rate limit works. If you print 100 lines (or
> > > 1000) in 5 seconds, then you just stopped printing from that context
> > > for 59 minutes and 55 seconds. That's a long time to block printing.  
> > 
> > Are we talking about the same context?
> > 
> > I am talking about console drivers called from console_unlock(). It is
> > very special context because it is more or less recursive:
> > 
> >      + could cause infinite loop
> >      + the errors are usually the same again and again
> 
> The check is only when console_owner == current, which can easily
> happen with an interrupt let alone an NMI.

Yeah. Sergey pointed this out and I suggested to update it
to

	if (console_owner == current && !in_nmi() &&
	    !__ratelimit(&ratelimit_console))
		return 0;

Only messages from console drivers called from console_unlock()
should be ratelimited. Ratelimiting any other messages was not
intended (is a bug).

The above does not handle recursion in NMI. But console drivers
are called from NMI only when we flush consoles in panic().
I wonder if it is worth the effort.


> > > What happens if you had a couple of NMIs go off that takes up that
> > > time, and then you hit a bug 10 minutes later from that context. You
> > > just lost it.  
> > 
> > I do not understand how this is related to the NMI context.
> > The messages in NMI context are not throttled!
> > 
> > OK, the original patch throttled also NMI messages when NMI
> > interrupted console drivers. But it is easy to fix.
> 
> My mistake in just mentioning NMIs, because the check is on
> console_owner which can be set with interrupts enabled. That means an
> interrupt that does a print could hide printks from other interrupts or
> NMIs when console_owner is set.

No, call_console_drivers() is done with interrupts disabled:

		console_lock_spinning_enable();

		stop_critical_timings();	/* don't trace print latency */
 ---->		call_console_drivers(ext_text, ext_len, text, len);
		start_critical_timings();

		if (console_lock_spinning_disable_and_check()) {
 ---->			printk_safe_exit_irqrestore(flags);
			goto out;
		}

 ---->		printk_safe_exit_irqrestore(flags);

They were called with interrupts disabled for ages, long before
printk_safe. In fact, it was all the time in the git kernel history.

Therefore only NMIs are in the game. And they should be solved
by the above change.


> > I proposed that long delay because I want to be on the safe side.
> > Also I do not see a huge benefit in repeating the same messages
> > too often.
> 
> Actually, I think we are fine with the one hour and 1000 prints if we
> add to the condition.

great

> It can't just check console_owner. We need a way
> to know that this is indeed a recursion. Perhaps we should set the
> context we are in when setting console owner. Something like I have in
> the ring buffer code.

> enum {
> 	CONTEXT_NONE,
> 	CONTEXT_NMI,
> 	CONTEXT_IRQ,
> 	CONTEXT_SOFTIRQ,
> 	CONTEXT_NORMAL
> };
> 
> int get_context(void)
> {
> 	unsigned long pc = preempt_count();
> 
> 	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
> 		return CONTEXT_NORMAL;
> 	else
> 		return pc & NMI_MASK ? CONTEXT_NMI :
> 			pc & HARDIRQ_MASK ? CONTEXT_IRQ : CONTEXT_SOFTIRQ;
> }

We actually would need this only when flushing consoles in NMI in panic().
I am not sure of it is worth the effort.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20 14:57                     ` Petr Mladek
@ 2018-04-20 15:13                       ` Steven Rostedt
  2018-04-23 10:32                         ` Petr Mladek
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2018-04-20 15:13 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri, 20 Apr 2018 16:57:20 +0200
Petr Mladek <pmladek@suse.com> wrote:


> No, call_console_drivers() is done with interrupts disabled:
> 
> 		console_lock_spinning_enable();
> 
> 		stop_critical_timings();	/* don't trace print latency */
>  ---->		call_console_drivers(ext_text, ext_len, text, len);  
> 		start_critical_timings();
> 
> 		if (console_lock_spinning_disable_and_check()) {
>  ---->			printk_safe_exit_irqrestore(flags);  
> 			goto out;
> 		}
> 
>  ---->		printk_safe_exit_irqrestore(flags);  
> 
> They were called with interrupts disabled for ages, long before
> printk_safe. In fact, it was all the time in the git kernel history.
> 
> Therefore only NMIs are in the game. And they should be solved
> by the above change.
> 

Really?


  console_trylock_spinning(); /* console_owner now equals current */
  console_unlock() {

---> take interrupt here:

					vprintk() {
					   vprintk_func() {
					      if (console_owner == current && !__ratelimit(&ratelimit_console))

				[ RATE LIMIT HERE!!!! ]


	for (;;) {
		printk_safe_enter_irqsave(flags);

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20  9:12             ` Petr Mladek
  2018-04-20 12:04               ` Steven Rostedt
@ 2018-04-23  5:21               ` Sergey Senozhatsky
  2018-04-23 12:26                 ` Petr Mladek
  1 sibling, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2018-04-23  5:21 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, akpm, linux-mm,
	Peter Zijlstra, Jan Kara, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (04/20/18 11:12), Petr Mladek wrote:
[..]
> Will 1000 lines within 1 hour be enough for you, please?

I'm afraid it won't.

> I am lost. In the mail
[..]
> My understanding of the older mail is that you called
> console_drivers() in printk_safe() context only because it was
> easier to disable printk_safe context later together with
> enabling irqs.

Correct.

> My understanding of today's mail is that it is important
> to call console drivers in printk_safe() context.

No, I don't think that it is what I said.

> It is a contradiction. Could you please explain?

Let me try again.

call_console_drivers() is super complex, unbelievable complex. In fact,
it's so complex that we never know where we will end up, because it can
pass the control to almost every core kernel mechanism or subsystem:
kobjects, spin locks, tty, sdp, uart, vt, fbdev, dri, kms, timers,
timekeeping, networking, mm, scheduler, you name it. Thousands and
thousands lines of code, which are not executed exclusively by the
console drivers. That core kernel code that we are dealing with has its
own fault/error reporting mechanisms, some of which comes in forms of
WARN_ON()-s or printk()-s, or dump_stack()-s, or BUG_ON()-s and so on.

Now, for many, many years printk()-s from console_unlock()->call_console_driver()
were absolutely legal and fine. And it was useful, and helpful mechanism,
and that's why people used it (and continue to do so). A very quick
googling:
	https://bugzilla.altlinux.org/attachment.cgi?id=5811
or
	https://access.redhat.com/solutions/702533
or
	https://bugzilla.redhat.com/attachment.cgi?id=561164
or
	https://lists.gt.net/linux/kernel/2341113
or
	https://www.systutorials.com/linux-kernels/56987/ib-mlx4-reduce-sriov-multicast-cleanup-warning-message-to-debug-level-linux-4-10-17/
or
	https://github.com/raspberrypi/linux/issues/663
or
	https://bugs.openvz.org/browse/VZWEB-36
or
  any other bug report which involves console_unlock()->call_console_drivers(),
there are *tons* of them. And the reason why those printk()-s were, and
they still are, legal was [and is] because those printk()-s were [and are]
harmless - they didn't [don't] deadlock the system. [not to mention VT, console
drivers, etc. debugging]. Throttling down that error mechanism to 100 lines
per hour, or 1000 lines per hour is unlikely will be welcomed.

When we introduced printk_safe() we had a completely different goal.
printk_safe() did not make call_console_drivers() any safer. Because
printk_safe() has *nothing* to do with console drivers or the underlying
code. The only thing that has changed on the console_drivers side with
the introduction of printk_safe() was that we enabled lockdep, and thus
RCU sanity checks, in printk() and console_drivers. So we just opened up
one more error reporting channel - a small, but very important, addition
to already existing numerous error reporting printk()-s, dump_stack()-s
which call_console_drivers()->foo() can trigger. And that additional
console_drivers error reporting channel works really well for us:
	http://lkml.kernel.org/r/20170928120405.18273-1-sergey.senozhatsky@gmail.com
or
	http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
or
	lkml.kernel.org/r/alpine.LFD.2.20.1703201736070.1753@schleppi
or
	so on. We have a number of those "additional" reports.

When it comes to call_console_drivers() neither lockdep nor RCU need
printk_safe(). Because console_unlock()->call_console_drivers()->printk()
is totally normal, legal, fine, and has been around for years. We need
printk_safe() because of the way vprintk_emit() works - we protect logbuf
spin_lock and console_sem spin_lock with print_safe, -- not because of the
console_drivers [which don't deal with logbuf or console_sem to begin
with]. In other words, printk_safe() is *irrelevant* when it comes to
console drivers.

If we continue calling console_drivers under printk_safe(), and I don't
think that we should do so [I said it several times], then it is *absolutely*
important to keep it as permissive as possible [I also said it several times].
But let me be crystal clear - we better stop calling console_drivers under
printk_safe, it is pointless, useless and introduces unneeded IRQ->work->flush
dependency. It is also my believe that printk()-s from call_console_drivers()
must stay "un-throttled", exactly the way there were [and they are] for years.
Otherwise, we simply will "secretly" and "suprisingly" turn a HUGE number of
printk()-s, dump_stack(), WARN_ON()-s, etc. in sched, net, mm, tty, fbdev, vt,
you name it, into rate-limited printk() for no reason. We will shut up quite a
number of valid, useful and important error reporting channels. This does look
and smell like a massive, massive regression.

Basically, from the vt, tty, timekeeping, sched, net, mm, you name it,
prospective, what we do is:

---

 kernel/printk/printk.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 247808333ba4..8df861e6e0a3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1828,6 +1828,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 			    const char *dict, size_t dictlen,
 			    const char *fmt, va_list args)
 {
+	static DEFINE_RATELIMIT_STATE(ratelimit_console, 60 * 60 * HZ, 100);
 	static char textbuf[LOG_LINE_MAX];
 	char *text = textbuf;
 	size_t text_len;
@@ -1836,6 +1837,9 @@ asmlinkage int vprintk_emit(int facility, int level,
 	int printed_len;
 	bool in_sched = false;
 
+	if (!__ratelimit(&ratelimit_console))
+		return 0;
+
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
 		in_sched = true;

---

It is really so.

What part of this plan works for us?

Among all the patches and proposal that we saw so far, one stands out - it's
the original Tejun's patch [offloading to work queue]. Because it has zero
interference with the existing call_console_drivers()->printk() channels.
Whatever comes from any of the underlying subsystems [networking, vt, mm,
you name it] comes for a reason, we should not blindly and boldly discard
those messages. And I really love that part of Tejun's patch.

But let's think about it - we haven't even looked at the messages that are
supposedly killing Tejun's boxes. We can't be serious now. We don't even
know what we are dealing with, we don't even know what [and why] the kernel
was reporting to us. We should have looked at the logs in the first place,
and probably consider some rate-limiting in the misbehaving code, like we
always do.

What is so special about this case that we decided to screw up printk()
instead?

So I'm going to volunteer and I'm willing to take a look at the logs
(hopefully Tejun can send some).

I think that we need to apply the patch below.
That call_console_drivers()->printk->IRQ_work->irq->flush appears to be
pointless.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 247808333ba4..484c456c095a 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2385,9 +2385,11 @@ void console_unlock(void)
 		 */
 		console_lock_spinning_enable();
 
+		__printk_safe_exit();
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
+		__printk_safe_enter();
 
 		if (console_lock_spinning_disable_and_check()) {
 			printk_safe_exit_irqrestore(flags);
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-20 15:13                       ` Steven Rostedt
@ 2018-04-23 10:32                         ` Petr Mladek
  2018-04-23 11:36                           ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Mladek @ 2018-04-23 10:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri 2018-04-20 11:13:07, Steven Rostedt wrote:
> On Fri, 20 Apr 2018 16:57:20 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> 
> > No, call_console_drivers() is done with interrupts disabled:
> > 
> > 		console_lock_spinning_enable();
> > 
> > 		stop_critical_timings();	/* don't trace print latency */
> >  ---->		call_console_drivers(ext_text, ext_len, text, len);  
> > 		start_critical_timings();
> > 
> > 		if (console_lock_spinning_disable_and_check()) {
> >  ---->			printk_safe_exit_irqrestore(flags);  
> > 			goto out;
> > 		}
> > 
> >  ---->		printk_safe_exit_irqrestore(flags);  
> > 
> > They were called with interrupts disabled for ages, long before
> > printk_safe. In fact, it was all the time in the git kernel history.
> > 
> > Therefore only NMIs are in the game. And they should be solved
> > by the above change.
> > 
> 
> Really?
> 
> 
>   console_trylock_spinning(); /* console_owner now equals current */

No, console_trylock_spinning() does not modify console_owner. The
handshake is done using console_waiter variable.

console_owner is really set only between:

    console_lock_spinning_enable()
    console_lock_spinning_disable_and_check()

and this entire section is called with interrupts disabled.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-23 10:32                         ` Petr Mladek
@ 2018-04-23 11:36                           ` Steven Rostedt
  2018-04-23 12:45                             ` Petr Mladek
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2018-04-23 11:36 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Mon, 23 Apr 2018 12:32:32 +0200
Petr Mladek <pmladek@suse.com> wrote:

> > Really?
> > 
> > 
> >   console_trylock_spinning(); /* console_owner now equals current */  
> 
> No, console_trylock_spinning() does not modify console_owner. The
> handshake is done using console_waiter variable.

Ug, you're right. Somehow when I looked at where console_owner was set
"console_lock_spinning_enabled" I saw it as "console_trylock_spinning".

This is what I get when I'm trying to follow three threads at the same
time :-/

> 
> console_owner is really set only between:
> 
>     console_lock_spinning_enable()
>     console_lock_spinning_disable_and_check()
> 
> and this entire section is called with interrupts disabled.

OK, I agree with you now. Although, one hour may still be too long.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-23  5:21               ` Sergey Senozhatsky
@ 2018-04-23 12:26                 ` Petr Mladek
  2018-04-23 13:00                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Mladek @ 2018-04-23 12:26 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Mon 2018-04-23 14:21:33, Sergey Senozhatsky wrote:
> On (04/20/18 11:12), Petr Mladek wrote:
> call_console_drivers() is super complex, unbelievable complex. In fact,
> it's so complex that we never know where we will end up, because it can
> pass the control to almost every core kernel mechanism or subsystem:

I fully agree. We will never be able to completely avoid messages
from these code paths and it would even be contra-productive. People
need to see the problems and be able to debug them. BTW: I wrote
this in the patch description.

> A very quick googling:

Nice collection. Just note that the useful (ratelimited) information
always fits into the first 100 lines in all these examples:

> 	https://bugzilla.altlinux.org/attachment.cgi?id=5811

The two WARNINGs starts repeating after 65 lines.

Note that the backtraces at the end of the log are from
NMI watchdog => not ratelimited by this patch (with the extra fix)


> 	https://access.redhat.com/solutions/702533

fits into 52 lines

> 	https://bugzilla.redhat.com/attachment.cgi?id=561164

both warnings fit into 54 lines

> 	https://lists.gt.net/linux/kernel/2341113

BUG report fits into 26 lines

> 	https://www.systutorials.com/linux-kernels/56987/ib-mlx4-reduce-sriov-multicast-cleanup-warning-message-to-debug-level-linux-4-10-17/

NMI bactrace => not affected by this patch (with the extra fix)

> 	https://github.com/raspberrypi/linux/issues/663

Bug report fits into 29 lines

> 	https://bugs.openvz.org/browse/VZWEB-36

Starts to repeat after 79 lines. I wonder if it ever ended without
killing the system.




> Throttling down that error mechanism to 100 lines
> per hour, or 1000 lines per hour is unlikely will be welcomed.

I wonder if you have bigger problems with the number of lines
or with the length of the period.

We simply _must_ limit the number of lines. Otherwise we would
never be able to break an _infinite_ loop.

But we could eventually replace the time period with a more
complex solution. For example, if we call console drivers
outside printk_safe context then the "recursive" messages
will be written to the mail log buffer directly. Then we could
reset the counter of the recursive messages when leaving
console_unlock() the regular way. I mean when all lines are
handled.


> Among all the patches and proposal that we saw so far, one stands out - it's
> the original Tejun's patch [offloading to work queue]. Because it has zero
> interference with the existing call_console_drivers()->printk()
> channels.

The only problem is that it does not solve the infinite loop. If
writing one line produces one or more new lines (warnings/errors)
than the only way out is to start dropping the recursive messages.
Offloading would just move the infinite loop to another process.

Note that the offload might help if there is a deadlock/livelock
between the original printk() caller and console drivers. Then moving
console_unlock() to another "safe" context helps to unblock the
situation. But the offload has its own problems and limiting
the number of recursive messages would solve this as well.


> What is so special about this case that we decided to screw up printk()
> instead?

There are many situations where printk() is limited. For example, we
are limited to 8kB in NMI or printk_safe() context. The printk_safe()
context is about printk debugging. Also messages from console drivers
are about printk debugging. There must be some limitations by
definition.


> I think that we need to apply the patch below.
> That call_console_drivers()->printk->IRQ_work->irq->flush appears to be
> pointless.

I agree that calling console drivers in printk_safe context is
pointless. Normal vprintk_emit() can be safely called because
logbuf_lock is not taken here. Also console_unlock() will
never by called recursively because it is guarded by console_sem.

> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/printk.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 247808333ba4..484c456c095a 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2385,9 +2385,11 @@ void console_unlock(void)
>  		 */
>  		console_lock_spinning_enable();
>  
> +		__printk_safe_exit();
>  		stop_critical_timings();	/* don't trace print latency */
>  		call_console_drivers(ext_text, ext_len, text, len);
>  		start_critical_timings();
> +		__printk_safe_enter();

Is this by intention? What is the reason to call
console_lock_spinning_disable_and_check() in printk_safe() context, please?

>  		if (console_lock_spinning_disable_and_check()) {
>  			printk_safe_exit_irqrestore(flags);

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-23 11:36                           ` Steven Rostedt
@ 2018-04-23 12:45                             ` Petr Mladek
  2018-04-25  5:31                               ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Mladek @ 2018-04-23 12:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Mon 2018-04-23 07:36:03, Steven Rostedt wrote:
> On Mon, 23 Apr 2018 12:32:32 +0200
> Ug, you're right. Somehow when I looked at where console_owner was set
> "console_lock_spinning_enabled" I saw it as "console_trylock_spinning".
> 
> This is what I get when I'm trying to follow three threads at the same
> time :-/

They are not easy to follow :-/

> > console_owner is really set only between:
> > 
> >     console_lock_spinning_enable()
> >     console_lock_spinning_disable_and_check()
> > 
> > and this entire section is called with interrupts disabled.
> 
> OK, I agree with you now. Although, one hour may still be too long.

I am not sure how slow are the slowest consoles. If I take that
everything should be faster than 1200 bauds. Then 10 minutes
should be enough for 1000 lines and 80 characters per-line:

   1000*80*8/1200/60 = 8.8888888


Alternatively, it seems that we are going to call console drivers
outside printk_safe context => the messages will appear in the main
log buffer immediately => only small risk of a ping-pong with printk
safe buffers. We might reset the counter when all messages are handled
in console_unlock(). It will be more complex patch than when using
ratelimiting but it still should be sane.

Neither solution is perfect. But I think that the recursion is not
worth any too complex solution.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-23 12:26                 ` Petr Mladek
@ 2018-04-23 13:00                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2018-04-23 13:00 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, akpm, linux-mm,
	Peter Zijlstra, Jan Kara, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (04/23/18 14:26), Petr Mladek wrote:
> need to see the problems and be able to debug them. BTW: I wrote
> this in the patch description.
> 
> > A very quick googling:
> 
> Nice collection. Just note that the useful (ratelimited) information
> always fits into the first 100 lines in all these examples:

I'm *very and really* sorry to ask that, but are you serious now?
Apologies if I'm not getting a joke here, or something.

[..]
> > Throttling down that error mechanism to 100 lines
> > per hour, or 1000 lines per hour is unlikely will be welcomed.
> 
> I wonder if you have bigger problems with the number of lines
> or with the length of the period.
> 
> We simply _must_ limit the number of lines. Otherwise we would
> never be able to break an _infinite_ loop.

Give me examples of such messages, and please do explain why you were
not able to rate-limit them and instead decided to introduce a system
wide printk() rate-limit.

> > Among all the patches and proposal that we saw so far, one stands out - it's
> > the original Tejun's patch [offloading to work queue]. Because it has zero
> > interference with the existing call_console_drivers()->printk()
> > channels.
> 
> The only problem is that it does not solve the infinite loop.

Same as above. I'm not kidding. I really want to know what unfixable&endless
messages you are talking about? May I take look at the backtraces?

> > What is so special about this case that we decided to screw up printk()
> > instead?
> 
> Also messages from console drivers are about printk debugging. There must
> be some limitations by definition.

No. Check the links that I found after _literally_ 5 seconds of googling.
Tons of messages are coming from core kernel code. Nothing to do with
the debugging. It *is* a valid and widely used error reporting channel.
End of story.


SERIOUSLY. PLEASE (!) - don't turn printk() into rate-limited printk().
Don't introduce that HUGE regression. Let's handle it the same way as we
always do - let's look at the logs, and rate-limit misbehaving code.


> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 247808333ba4..484c456c095a 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2385,9 +2385,11 @@ void console_unlock(void)
> >  		 */
> >  		console_lock_spinning_enable();
> >  
> > +		__printk_safe_exit();
> >  		stop_critical_timings();	/* don't trace print latency */
> >  		call_console_drivers(ext_text, ext_len, text, len);
> >  		start_critical_timings();
> > +		__printk_safe_enter();
> 
> Is this by intention? What is the reason to call
> console_lock_spinning_disable_and_check() in printk_safe() context, please?

Yes, it is. console_lock_spinning_enable() is touching console_owner_lock:
an `internal' printk lock -- which we also touch in vprintk_emit(). As such
[internal printk lock] it must be accessed under printk_safe(), by definition.

> >  		if (console_lock_spinning_disable_and_check()) {
> >  			printk_safe_exit_irqrestore(flags);

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-23 12:45                             ` Petr Mladek
@ 2018-04-25  5:31                               ` Sergey Senozhatsky
  2018-04-26  9:42                                 ` Petr Mladek
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2018-04-25  5:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Steven Rostedt, Sergey Senozhatsky, akpm, linux-mm,
	Peter Zijlstra, Jan Kara, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (04/23/18 14:45), Petr Mladek wrote:
[..]
> I am not sure how slow are the slowest consoles. If I take that
> everything should be faster than 1200 bauds. Then 10 minutes
> should be enough for 1000 lines and 80 characters per-line:

Well, the problem with the numbers is that they are too... simple...
let me put it this way.

What if I don't have a slow serial console? Or what if I have NMI
watchdog set to 40 seconds? Or what if I don't have NMIs at all?
Why am I all of a sudden limited by "1200 bauds"?

Another problem is that we limit the *wrong* thing.

Not only because we can [and probably need to] rate-limit the misbehaving
code that calls printk()-s, instead of printk(). But because we claim
that we limit the "number of lines" added recursively. This is wrong.
We limit the number of times vprintk_func() was called, which is != the
number of added lines. Because vprintk_func() is also called for pr_cont()
or printk(KERN_CONT) or printk("missing new line"). Backtraces contain
tons and tons of pr_cont()-s - registers print out, list of modules
print out, stack print out, code print out. Even this thing at the
bottom of a trace:

	Code: 01 ca 49 89 d1 48 89 d1 48 c1 ea 23 48 8b 14 d5 80 23 63 82 49 c1 e9 0c 48 c1 e9 1b 48 85 d2 74 0a 0f b6 c9 48 c1 e1 04 48 01 ca <48> 8b 12 49 c1 e1 06 b9 00 00 00 80 89 7d 80 89 75 84 48 8b 3d

is nothing but a bunch of pr_cont()-s, each of which will individually
end up in vprintk_func(). Error reports in general can contain even more
pr_cont() calls. E.g. core kernel code can hex dump slab memory, while
being called from one of console drivers.

Another problem is that nothing tells us that we *actually* have an
infinite loop. Nothing tells us that every call_console_drivers()
adds more messages to the logbuf. We see just one thing - the current
call_console_drivers() is about to add some lines to the logbuf later
on. OK, why is this a problem? This can be a one time thing. Or
console_unlock() may be in a schedulable context, getting rescheduled
after every line it prints [either implicitly after
printk_safe_exit_irqrestore(), or explicitly by calling into the
scheduler - cond_resched()].

Most likely, we don't even realize how many things we are about to
break.

> Alternatively, it seems that we are going to call console drivers
> outside printk_safe context => the messages will appear in the main
> log buffer immediately => only small risk of a ping-pong with printk
> safe buffers. We might reset the counter when all messages are handled
> in console_unlock(). It will be more complex patch than when using
> ratelimiting but it still should be sane.

We may have some sort of vprintk_func()-based solution, may be.
But we first need a real reason. Right now it looks to me like
we have "a solution" to a problem which we have never witnessed.

That vprintk_func()-based solution, if there will be no other
options on the table, must be much smarter than anything that
we have seen so far.

Sorry.

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-25  5:31                               ` Sergey Senozhatsky
@ 2018-04-26  9:42                                 ` Petr Mladek
  2018-04-27 10:22                                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 27+ messages in thread
From: Petr Mladek @ 2018-04-26  9:42 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Wed 2018-04-25 14:31:46, Sergey Senozhatsky wrote:
> On (04/23/18 14:45), Petr Mladek wrote:
> [..]
> > I am not sure how slow are the slowest consoles. If I take that
> > everything should be faster than 1200 bauds. Then 10 minutes
> > should be enough for 1000 lines and 80 characters per-line:
> 
> Well, the problem with the numbers is that they are too... simple...
> let me put it this way.

Sometimes a simple solution is the acceptable one.

For example, I created a complicated solution for lock-less printk
in NMI that reduced loosing messages to minimum. It ended in lkml
archives, covered by not very nice comments. Instead we ended with
quite limited per-CPU buffers.

Believe me, I could perfectly understand the desire to create perfect
defensive solutions that would never break anything. It is not easy
to decide when the best-effort solutions are worth the risk.


> What if I don't have a slow serial console? Or what if I have NMI
> watchdog set to 40 seconds? Or what if I don't have NMIs at all?
> Why am I all of a sudden limited by "1200 bauds"?

Because it keeps the solution simple and I believe that it might
be enough.


> We limit the number of times vprintk_func() was called, which is != the
> number of added lines. Because vprintk_func() is also called for
> pr_cont().

Good point. Well, it is rather trivial to count characters instead
of a number of calls.


> Another problem is that nothing tells us that we *actually* have an
> infinite loop. This can be a one time thing.

  + one time event => should fit into limit => nothing lost

  + repeated thing => repeated message does not bring new information
     => limiting is acceptable and even useful.

Honestly, I do not believe that console drivers are like Scheherazade.
They are not able to make up long interesting stories. Let's say that
lockdep splat has more than 100 lines but it can happen only once.
Let's say that WARNs have about 40 lines. I somehow doubt that we
could ever see 10 different WARN calls from one con->write() call.

1200 baud slow console is able to handle 90kB of text within 10
minutes:

   (1200 / 8) * 60 * 10 = 90000

It is more than 1000 lines full of 80 characters:

   (1200 / 8) * 60 * 10 / 80 = 1125

IMHO, it is more than enough to get useful report.


> But we first need a real reason. Right now it looks to me like
> we have "a solution" to a problem which we have never witnessed.

I am trying to find a "simple" and generic solution for the problem
reported by Tejun:

<paste from 20180110170223.GF3668920@devbig577.frc2.facebook.com>
The particular case that we've been seeing regularly in the fleet was
the following scenario

1. Console is IPMI emulated serial console.  Super slow.  Also
   netconsole is in use.
2. System runs out of memory, OOM triggers.
3. OOM handler is printing out OOM debug info.
4. While trying to emit the messages for netconsole, the network stack
   / driver tries to allocate memory and then fail, which in turn
   triggers allocation failure or other warning messages.  printk was
   already flushing, so the messages are queued on the ring.
5. OOM handler keeps flushing but 4 repeats and the queue is never
   shrinking.  Because OOM handler is trapped in printk flushing, it
   never manages to free memory and no one else can enter OOM path
   either, so the system is trapped in this state.
</paste>

It was a bit specific because offloading helped to unblock OOM path
and calm down the errors.

But let me use your words:

<paste from 20180425053146.GA25288@jagdpanzerIV>
call_console_drivers() is super complex, unbelievable complex. In fact,
it's so complex that we never know where we will end up, because it can
pass the control to almost every core kernel mechanism or subsystem:
kobjects, spin locks, tty, sdp, uart, vt, fbdev, dri, kms, timers,
timekeeping, networking, mm, scheduler, you name it. Thousands and
thousands lines of code, which are not executed exclusively by the
console drivers.
</paste>

For me it is hard to believe that all these possible errors will be
cured just by offloading. Not to say that offloading is not trivial
and there is some resistance against it.

Finally, the limiting does not only help to unblock the victim that
called console_unlock(). I also helps printk() subsystem to survive
a potential infinite DOS attack.



Well, I think about alternative solution inspired by:

<paste from 20180416042553.GA555@jagdpanzerIV>
I'd really prefer to rate limit the function which flushes per-CPU
printk_safe buffers; not the function that appends new messages to
the per-CPU printk_safe buffers.
</paste>

The is one unused flag in struct printk_log flags:5. We could use
it to mark messages from console drivers context. Then we could store
all messages and ratelimit only flushing these messages to the
console.

I still think that limiting the write into the log buffer is better
approach because it would prevent loosing other useful messages.
But this alternative approach is definitely more conservative.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-26  9:42                                 ` Petr Mladek
@ 2018-04-27 10:22                                   ` Sergey Senozhatsky
  2018-05-09 12:00                                     ` Petr Mladek
  0 siblings, 1 reply; 27+ messages in thread
From: Sergey Senozhatsky @ 2018-04-27 10:22 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Steven Rostedt, akpm, linux-mm,
	Peter Zijlstra, Jan Kara, Tetsuo Handa, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (04/26/18 11:42), Petr Mladek wrote:
[..]
> 
> Believe me, I could perfectly understand the desire to create perfect
> defensive solutions that would never break anything. It is not easy
> to decide when the best-effort solutions are worth the risk.

Yes, but my point is - I don't think we clearly understand the root
cause of the problem. See below [you can jump over the next section].

[..]
> Honestly, I do not believe that console drivers are like Scheherazade.
> They are not able to make up long interesting stories. Let's say that
> lockdep splat has more than 100 lines but it can happen only once.
> Let's say that WARNs have about 40 lines. I somehow doubt that we
> could ever see 10 different WARN calls from one con->write() call.

The problem here is that it takes a human being with IQ to tell what's
repetitive, what's useless and what's not.

	vprintk(...)
	{
		if (!__ratelimit())
			return;
	}

has zero IQ to make such decisions. Sorry, the numbers don't work for me.
"Console drivers added 100 lines in 1 hour" does not tell me that we had
"an infinite console_unlock() loop".
Dunno. Quite likely I'm wrong. Wouldn't be the first time ever. But it's
unclear to me why we are out of options, without even looking at
logs, but... May be.


> > But we first need a real reason. Right now it looks to me like
> > we have "a solution" to a problem which we have never witnessed.
> 
> I am trying to find a "simple" and generic solution for the problem
> reported by Tejun:
[..]
> 1. Console is IPMI emulated serial console.  Super slow.  Also
>    netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>    / driver tries to allocate memory and then fail, which in turn
>    triggers allocation failure or other warning messages.  printk was
>    already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>    shrinking.  Because OOM handler is trapped in printk flushing, it
>    never manages to free memory and no one else can enter OOM path
>    either, so the system is trapped in this state.
> </paste>

Yes, and that's why I want to take a look at the logs/backtraces.

I had a very-very quick look at netconsole code, and some parts of net
code [skb allocation, etc]. And I didn't manage to find that "every
console driver call adds new messages to the logbuf".

OK, suppose that at some point netcon or net stack does kmalloc(GFP_ATOMIC)
or alloc_pages(GFP_ATOMIC). So if it was kmalloc() and we need to
cache_grow_begin()->kmem_getpages() or ___slab_alloc()->new_slab(),
then:

- kmem_getpages() can slab_out_of_memory().
  But slab_out_of_memory():

 a) ratelimited to DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST
 b) enabled only `#if DEBUG' [probably out of consideration]

- So we end up in __alloc_pages_slowpath():

  alloc_pages() -> __alloc_pages_slowpath()
or
  kmem_getpages()->__alloc_pages_slowpath()
or
  ___slab_alloc()->new_slab()->__alloc_pages_slowpath()

__alloc_pages_slowpath() can call warn_alloc(), yes.

But warn_alloc() is also ratelimted to DEFAULT_RATELIMIT_INTERVAL and
DEFAULT_RATELIMIT_BURST.


That "triggers allocation failure or other warning messages" part is
questionable.


1) Does the bug report actually say that - DEFAULT_RATELIMIT_INTERVAL
and DEFAULT_RATELIMIT_BURST in MM are not good enough? If so, then why do
we "fix" printk()? Can we just tweak the rate limiting in MM code?

2) If the bug report is saying that MM code has non-ratelimited
warnings/printouts in allocation path - then we need to rate limit
those; the same way it's done in slab_out_of_memory() / warn_alloc().


OK...

So DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST allows
up to 10 warn_alloc()-s in 5 seconds. Let's assume that allocation
failure backtrace is around 50 lines [including registers print out].
Along with backtrace we have Mem-info print out. Let's assume that it's
around 30 lines. So a single warn_alloc() can be around 80 lines.
Mem-info lines are very long, so it would be reasonable to count the
number of characters here, but let's just use lines. So we can have around
800 lines in just 10 seconds.

Now, I think "Console is IPMI emulated serial console.  Super slow."
is a critically important bit of information. [And that's why we better
count the number of characters in OOM report, not the lines].

We print to the consoles sequentially, hence netconsole becomes as "fast"
as that super slow IPMI console. MM ratelimit, meanwhile, does not care.
The slower IPMI is, the "sooner" netconsole gets its chance to add 10 more
kmalloc()->warn_alloc() to the logbuf - another ~800 lines, or may be
significantly more than that. Because of slow IPMI we don't call
netconsole frequent enough to get advantage of the rate limiting in
warn_alloc().

[And notice how "100/1000 lines per hour" is a bad news in this regard].


If this is the case, and if my assumptions are valid, then we really
should not rate limit console_drivers()->printk(). Mem-info print outs
don't seem to be "a repetitive garbage" that we can easily discard.
Those are important bits of information, which show the OOM/reclaimer
progress, and so on.


So, Petr, let's slow down for a second. There are things that are
not completely clear.

Opinions?

[..]
> For me it is hard to believe that all these possible errors will be
> cured just by offloading. Not to say that offloading is not trivial
> and there is some resistance against it.

Well, I didn't say this. All I said was that I really like that "the patch
does not pretend to be smart and does not drop random printk() messages"
part.

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-04-27 10:22                                   ` Sergey Senozhatsky
@ 2018-05-09 12:00                                     ` Petr Mladek
  2018-05-09 12:59                                       ` Steven Rostedt
  2019-02-26 10:24                                       ` Tetsuo Handa
  0 siblings, 2 replies; 27+ messages in thread
From: Petr Mladek @ 2018-05-09 12:00 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Steven Rostedt, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Fri 2018-04-27 19:22:45, Sergey Senozhatsky wrote:
> On (04/26/18 11:42), Petr Mladek wrote:
> [..]
> > Honestly, I do not believe that console drivers are like Scheherazade.
> > They are not able to make up long interesting stories. Let's say that
> > lockdep splat has more than 100 lines but it can happen only once.
> > Let's say that WARNs have about 40 lines. I somehow doubt that we
> > could ever see 10 different WARN calls from one con->write() call.
> 
> The problem here is that it takes a human being with IQ to tell what's
> repetitive, what's useless and what's not.
> 
> 	vprintk(...)
> 	{
> 		if (!__ratelimit())
> 			return;
> 	}
> 
> has zero IQ to make such decisions.

You make it too complicated. Also it seems that you repeatedly
hide the fact that con->write() context is recursive. Just try to add
printk() into call_console_drivers() and see what happens.

IMHO, if con->write() wants to add more than 1000 (or 100 or whatever
sane limit) new lines then something is really wrong and we should
stop it. It is that simple.


> > > But we first need a real reason. Right now it looks to me like
> > > we have "a solution" to a problem which we have never witnessed.
> > 
> > I am trying to find a "simple" and generic solution for the problem
> > reported by Tejun:
> [..]
> > 1. Console is IPMI emulated serial console.  Super slow.  Also
> >    netconsole is in use.
> > 2. System runs out of memory, OOM triggers.
> > 3. OOM handler is printing out OOM debug info.
> > 4. While trying to emit the messages for netconsole, the network stack
> >    / driver tries to allocate memory and then fail, which in turn
> >    triggers allocation failure or other warning messages.  printk was
> >    already flushing, so the messages are queued on the ring.
> > 5. OOM handler keeps flushing but 4 repeats and the queue is never
> >    shrinking.  Because OOM handler is trapped in printk flushing, it
> >    never manages to free memory and no one else can enter OOM path
> >    either, so the system is trapped in this state.
> > </paste>

IMHO, we do not need to chase down this particular problem. It was
already "solved" by the commit 400e22499dd92613821 ("mm: don't warn
about allocations which stall for too long").

It was just an example. I wanted to make con->write() generally safe.
I thought that the problem (recursion) was clear enough.


> Yes, and that's why I want to take a look at the logs/backtraces.

If you want more cases to analyze, fair enough. I do not have any
at hands. It is not an urgent issue for me and I am not going to
spend more time on this.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-05-09 12:00                                     ` Petr Mladek
@ 2018-05-09 12:59                                       ` Steven Rostedt
  2019-02-26 10:24                                       ` Tetsuo Handa
  1 sibling, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2018-05-09 12:59 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tetsuo Handa, Tejun Heo, linux-kernel, Sergey Senozhatsky

On Wed, 9 May 2018 14:00:50 +0200
Petr Mladek <pmladek@suse.com> wrote:

> IMHO, if con->write() wants to add more than 1000 (or 100 or whatever
> sane limit) new lines then something is really wrong and we should
> stop it. It is that simple.
> 

+1

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2018-05-09 12:00                                     ` Petr Mladek
  2018-05-09 12:59                                       ` Steven Rostedt
@ 2019-02-26 10:24                                       ` Tetsuo Handa
  2019-02-28  4:45                                         ` Sergey Senozhatsky
  1 sibling, 1 reply; 27+ messages in thread
From: Tetsuo Handa @ 2019-02-26 10:24 UTC (permalink / raw)
  To: Petr Mladek, Sergey Senozhatsky
  Cc: Steven Rostedt, akpm, linux-mm, Peter Zijlstra, Jan Kara,
	Tejun Heo, linux-kernel, Sergey Senozhatsky

On 2018/05/09 21:00, Petr Mladek wrote:
>>>> But we first need a real reason. Right now it looks to me like
>>>> we have "a solution" to a problem which we have never witnessed.
>>>
>>> I am trying to find a "simple" and generic solution for the problem
>>> reported by Tejun:
>> [..]
>>> 1. Console is IPMI emulated serial console.  Super slow.  Also
>>>    netconsole is in use.
>>> 2. System runs out of memory, OOM triggers.
>>> 3. OOM handler is printing out OOM debug info.
>>> 4. While trying to emit the messages for netconsole, the network stack
>>>    / driver tries to allocate memory and then fail, which in turn
>>>    triggers allocation failure or other warning messages.  printk was
>>>    already flushing, so the messages are queued on the ring.
>>> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>>>    shrinking.  Because OOM handler is trapped in printk flushing, it
>>>    never manages to free memory and no one else can enter OOM path
>>>    either, so the system is trapped in this state.
>>> </paste>
> 
> IMHO, we do not need to chase down this particular problem. It was
> already "solved" by the commit 400e22499dd92613821 ("mm: don't warn
> about allocations which stall for too long").

Does memory allocation by network stack / driver while trying to emit
the messages include __GFP_DIRECT_RECLAIM flag (e.g. GFP_KERNEL) ?
Commit 400e22499dd92613821 handles only memory allocations with
__GFP_DIRECT_RECLAIM flag. If memory allocation when trying to emit
the messages does not include __GFP_DIRECT_RECLAIM flag (e.g.
GFP_ATOMIC / GFP_NOWAIT), doesn't this particular problem still exist?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] printk: Ratelimit messages printed by console drivers
  2019-02-26 10:24                                       ` Tetsuo Handa
@ 2019-02-28  4:45                                         ` Sergey Senozhatsky
  0 siblings, 0 replies; 27+ messages in thread
From: Sergey Senozhatsky @ 2019-02-28  4:45 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Petr Mladek, Sergey Senozhatsky, Steven Rostedt, akpm, linux-mm,
	Peter Zijlstra, Jan Kara, Tejun Heo, linux-kernel,
	Sergey Senozhatsky

On (02/26/19 19:24), Tetsuo Handa wrote:
> Does memory allocation by network stack / driver while trying to emit
> the messages include __GFP_DIRECT_RECLAIM flag (e.g. GFP_KERNEL) ?
> Commit 400e22499dd92613821 handles only memory allocations with
> __GFP_DIRECT_RECLAIM flag. If memory allocation when trying to emit
> the messages does not include __GFP_DIRECT_RECLAIM flag (e.g.
> GFP_ATOMIC / GFP_NOWAIT), doesn't this particular problem still exist?

Console drivers are always called from atomic contexts (including
IRQ->printk->console_driver paths); if any console driver does a
GFP_KERNEL allocation then it should be fixed.

	-ss

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-02-28  4:45 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-13 12:47 [PATCH] printk: Ratelimit messages printed by console drivers Petr Mladek
2018-04-13 14:12 ` Steven Rostedt
2018-04-14  2:35   ` Sergey Senozhatsky
2018-04-16  1:47     ` Sergey Senozhatsky
2018-04-16  4:25       ` Sergey Senozhatsky
2018-04-19 12:53         ` Petr Mladek
2018-04-20  2:15           ` Sergey Senozhatsky
2018-04-20  9:12             ` Petr Mladek
2018-04-20 12:04               ` Steven Rostedt
2018-04-20 14:01                 ` Petr Mladek
2018-04-20 14:17                   ` Steven Rostedt
2018-04-20 14:19                     ` Steven Rostedt
2018-04-20 14:57                     ` Petr Mladek
2018-04-20 15:13                       ` Steven Rostedt
2018-04-23 10:32                         ` Petr Mladek
2018-04-23 11:36                           ` Steven Rostedt
2018-04-23 12:45                             ` Petr Mladek
2018-04-25  5:31                               ` Sergey Senozhatsky
2018-04-26  9:42                                 ` Petr Mladek
2018-04-27 10:22                                   ` Sergey Senozhatsky
2018-05-09 12:00                                     ` Petr Mladek
2018-05-09 12:59                                       ` Steven Rostedt
2019-02-26 10:24                                       ` Tetsuo Handa
2019-02-28  4:45                                         ` Sergey Senozhatsky
2018-04-23  5:21               ` Sergey Senozhatsky
2018-04-23 12:26                 ` Petr Mladek
2018-04-23 13:00                   ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).