linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] printk: always report dropped messages
@ 2016-12-24 14:09 Sergey Senozhatsky
  2016-12-24 14:09 ` [PATCH 1/2] printk: drop call_console_drivers() unused param Sergey Senozhatsky
  2016-12-24 14:09 ` [PATCH 2/2] printk: always report lost messages on serial console Sergey Senozhatsky
  0 siblings, 2 replies; 18+ messages in thread
From: Sergey Senozhatsky @ 2016-12-24 14:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Steven Rostedt, Peter Hurley, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

Hello,

	Two patches: trivial clean up and console_unlock() "fix".

The `printk messages dropped' report is printed as part of actual
kernel message, that's why we do 'text + len, sizeof(text) - len'
later in msg_print_text().

The problem here is that we may eventually skip the message, and
thus lose the 'printk messages dropped' report, if `console_loglevel'
check tells us to do so. Missing kernel messages in serial log
together with the missing 'printk messages dropped' can be quite
confusing.


There are two options to address it:
a) forbid suppress_message_printing() if we know that we must print
   `printk messages dropped'

b) print `printk messages dropped' as a standalone message.


Sergey Senozhatsky (2):
  printk: drop call_console_drivers() unused param
  printk: always report lost messages on serial console

 kernel/printk/printk.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] printk: drop call_console_drivers() unused param
  2016-12-24 14:09 [PATCH 0/2] printk: always report dropped messages Sergey Senozhatsky
@ 2016-12-24 14:09 ` Sergey Senozhatsky
  2017-01-03 10:22   ` Petr Mladek
  2016-12-24 14:09 ` [PATCH 2/2] printk: always report lost messages on serial console Sergey Senozhatsky
  1 sibling, 1 reply; 18+ messages in thread
From: Sergey Senozhatsky @ 2016-12-24 14:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Steven Rostedt, Peter Hurley, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

We do suppress_message_printing() check before we call
call_console_drivers() now, so `level' param is not needed
anymore.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e2cdd87e7a63..11a9842a2f47 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1510,8 +1510,7 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
  */
-static void call_console_drivers(int level,
-				 const char *ext_text, size_t ext_len,
+static void call_console_drivers(const char *ext_text, size_t ext_len,
 				 const char *text, size_t len)
 {
 	struct console *con;
@@ -1895,8 +1894,7 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
 static ssize_t msg_print_ext_body(char *buf, size_t size,
 				  char *dict, size_t dict_len,
 				  char *text, size_t text_len) { return 0; }
-static void call_console_drivers(int level,
-				 const char *ext_text, size_t ext_len,
+static void call_console_drivers(const char *ext_text, size_t ext_len,
 				 const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg,
 			     bool syslog, char *buf, size_t size) { return 0; }
@@ -2220,7 +2218,6 @@ void console_unlock(void)
 		struct printk_log *msg;
 		size_t ext_len = 0;
 		size_t len;
-		int level;
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		if (seen_seq != log_next_seq) {
@@ -2243,8 +2240,7 @@ void console_unlock(void)
 			break;
 
 		msg = log_from_idx(console_idx);
-		level = msg->level;
-		if (suppress_message_printing(level)) {
+		if (suppress_message_printing(msg->level)) {
 			/*
 			 * Skip record we have buffered and already printed
 			 * directly to the console when we received it, and
@@ -2270,7 +2266,7 @@ void console_unlock(void)
 		raw_spin_unlock(&logbuf_lock);
 
 		stop_critical_timings();	/* don't trace print latency */
-		call_console_drivers(level, ext_text, ext_len, text, len);
+		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 		local_irq_restore(flags);
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] printk: always report lost messages on serial console
  2016-12-24 14:09 [PATCH 0/2] printk: always report dropped messages Sergey Senozhatsky
  2016-12-24 14:09 ` [PATCH 1/2] printk: drop call_console_drivers() unused param Sergey Senozhatsky
@ 2016-12-24 14:09 ` Sergey Senozhatsky
  2017-01-03 14:55   ` Petr Mladek
  2017-01-09 16:56   ` Petr Mladek
  1 sibling, 2 replies; 18+ messages in thread
From: Sergey Senozhatsky @ 2016-12-24 14:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Petr Mladek, Steven Rostedt, Peter Hurley, linux-kernel,
	Sergey Senozhatsky, Sergey Senozhatsky

The "printk messages dropped" report is 'attached' to a kernel
message located at console_idx offset. This does not work well
if we skip that message due to loglevel filtering, because in
this case we also skip/lose dropped message report.

Disable suppress_message_printing() loglevel filtering if we
must report "printk messages dropped" condition.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 kernel/printk/printk.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 11a9842a2f47..6a7ebcb0bb6e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2218,6 +2218,7 @@ void console_unlock(void)
 		struct printk_log *msg;
 		size_t ext_len = 0;
 		size_t len;
+		bool report_dropped_msg = false;
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		if (seen_seq != log_next_seq) {
@@ -2232,6 +2233,7 @@ void console_unlock(void)
 			/* messages are gone, move to first one */
 			console_seq = log_first_seq;
 			console_idx = log_first_idx;
+			report_dropped_msg = true;
 		} else {
 			len = 0;
 		}
@@ -2240,7 +2242,8 @@ void console_unlock(void)
 			break;
 
 		msg = log_from_idx(console_idx);
-		if (suppress_message_printing(msg->level)) {
+		if (!report_dropped_msg &&
+				suppress_message_printing(msg->level)) {
 			/*
 			 * Skip record we have buffered and already printed
 			 * directly to the console when we received it, and
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] printk: drop call_console_drivers() unused param
  2016-12-24 14:09 ` [PATCH 1/2] printk: drop call_console_drivers() unused param Sergey Senozhatsky
@ 2017-01-03 10:22   ` Petr Mladek
  2017-01-09 13:49     ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2017-01-03 10:22 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Steven Rostedt, Peter Hurley, linux-kernel,
	Sergey Senozhatsky

On Sat 2016-12-24 23:09:01, Sergey Senozhatsky wrote:
> We do suppress_message_printing() check before we call
> call_console_drivers() now, so `level' param is not needed
> anymore.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/printk.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index e2cdd87e7a63..11a9842a2f47 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1510,8 +1510,7 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
>   * log_buf[start] to log_buf[end - 1].
>   * The console_lock must be held.
>   */
> -static void call_console_drivers(int level,
> -				 const char *ext_text, size_t ext_len,
> +static void call_console_drivers(const char *ext_text, size_t ext_len,
>  				 const char *text, size_t len)

Yup, this patch makes sense on its own. The level parameter is unused
since the commit cf7754441c563230ed7 ("printk: introduce
suppress_message_printing()").

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2016-12-24 14:09 ` [PATCH 2/2] printk: always report lost messages on serial console Sergey Senozhatsky
@ 2017-01-03 14:55   ` Petr Mladek
  2017-01-03 15:47     ` Sergey Senozhatsky
  2017-01-09 16:56   ` Petr Mladek
  1 sibling, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2017-01-03 14:55 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Steven Rostedt, Peter Hurley, linux-kernel,
	Sergey Senozhatsky

On Sat 2016-12-24 23:09:02, Sergey Senozhatsky wrote:
> The "printk messages dropped" report is 'attached' to a kernel
> message located at console_idx offset. This does not work well
> if we skip that message due to loglevel filtering, because in
> this case we also skip/lose dropped message report.

Good catch!

> Disable suppress_message_printing() loglevel filtering if we
> must report "printk messages dropped" condition.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/printk.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 11a9842a2f47..6a7ebcb0bb6e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2218,6 +2218,7 @@ void console_unlock(void)
>  		struct printk_log *msg;
>  		size_t ext_len = 0;
>  		size_t len;
> +		bool report_dropped_msg = false;
>  
>  		raw_spin_lock_irqsave(&logbuf_lock, flags);
>  		if (seen_seq != log_next_seq) {
> @@ -2232,6 +2233,7 @@ void console_unlock(void)
>  			/* messages are gone, move to first one */
>  			console_seq = log_first_seq;
>  			console_idx = log_first_idx;
> +			report_dropped_msg = true;
>  		} else {
>  			len = 0;
>  		}
> @@ -2240,7 +2242,8 @@ void console_unlock(void)
>  			break;
>  
>  		msg = log_from_idx(console_idx);
> -		if (suppress_message_printing(msg->level)) {
> +		if (!report_dropped_msg &&
> +				suppress_message_printing(msg->level)) {
>  			/*
>  			 * Skip record we have buffered and already printed
>  			 * directly to the console when we received it, and

This causes the opposite problem. We might print a message that was supposed
to be suppressed. I think that it is not a good idea in this
situation. I played with it and cooked a patch, see below.

Please, do not feel offended. I do not want to take you credits
for finding the problem.

But the printk code is very twisted and console_unlock() is one
of the worst pieces. Solution based on my concerns could make
it even worse. I do not want to push you into clean ups and
tried it myself.

The result is not as good as I hoped. But it it is not worse, IMHO.
Which is good given the circumstances.

Feel free to provide even better one and use pieces from my patch.
I also think about spling it into two patches and make msg_print()
helper in a separate patch.



>From 976073de3e0f3daa4f1d700ef605165d97533c9b Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Tue, 3 Jan 2017 14:32:04 +0100
Subject: [PATCH] printk: Always report lost messages on serial console

The "printk messages dropped" report is 'attached' to a kernel
message located at console_idx offset. This does not work well
if we skip that message due to loglevel filtering, because in
this case we also skip/lose dropped message report.

A simple solution would be to ignore the level and always print
the warning with the next message.

But the situation suggests that we are under a high load and
could not afford printing less important messages.

Also we could not print only the warning because we might lose
even more messages in the meantime.

The best solution seems to be to print the warning with
the next visible message.

This patch tries to keep readability of the code. It puts
msg_print*() calls into a helper function. Also it hides there
the visibility check. As a result we could have only one copy of

	console_idx = log_next(console_idx);
	console_seq++;

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 61 ++++++++++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 27 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0dbde4e7bb15..7cc2e7effdc3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1234,6 +1234,28 @@ static size_t msg_print_text(const struct printk_log *msg, bool syslog, char *bu
 	return len;
 }
 
+static bool console_check_and_print_msg(u32 console_idx,
+			     char *text, size_t size, size_t *len,
+			     char *ext_text, size_t ext_size, size_t *ext_len)
+{
+	struct printk_log *msg = log_from_idx(console_idx);
+
+	if (suppress_message_printing(msg->level))
+		return false;
+
+	*len += msg_print_text(msg, false, text, size);
+	if (nr_ext_console_drivers) {
+		*ext_len = msg_print_ext_header(ext_text, ext_size,
+						msg, console_seq);
+		*ext_len += msg_print_ext_body(ext_text + *ext_len,
+					       ext_size - *ext_len,
+					       log_dict(msg), msg->dict_len,
+					       log_text(msg), msg->text_len);
+	}
+
+	return true;
+}
+
 static int syslog_print(char __user *buf, int size)
 {
 	char *text;
@@ -2215,9 +2237,9 @@ void console_unlock(void)
 	}
 
 	for (;;) {
-		struct printk_log *msg;
+		bool printed_msg = false;
 		size_t ext_len = 0;
-		size_t len;
+		size_t len = 0;
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		if (seen_seq != log_next_seq) {
@@ -2232,37 +2254,22 @@ void console_unlock(void)
 			/* messages are gone, move to first one */
 			console_seq = log_first_seq;
 			console_idx = log_first_idx;
-		} else {
-			len = 0;
 		}
-skip:
-		if (console_seq == log_next_seq)
-			break;
 
-		msg = log_from_idx(console_idx);
-		if (suppress_message_printing(msg->level)) {
-			/*
-			 * Skip record we have buffered and already printed
-			 * directly to the console when we received it, and
-			 * record that has level above the console loglevel.
-			 */
+		/* Get the next message with a visible level */
+		while (console_seq < log_next_seq && !printed_msg) {
+			printed_msg = console_check_and_print_msg(console_seq,
+					text + len, sizeof(text) - len, &len,
+					ext_text, sizeof(ext_text), &ext_len);
+
 			console_idx = log_next(console_idx);
 			console_seq++;
-			goto skip;
 		}
 
-		len += msg_print_text(msg, false, text + len, sizeof(text) - len);
-		if (nr_ext_console_drivers) {
-			ext_len = msg_print_ext_header(ext_text,
-						sizeof(ext_text),
-						msg, console_seq);
-			ext_len += msg_print_ext_body(ext_text + ext_len,
-						sizeof(ext_text) - ext_len,
-						log_dict(msg), msg->dict_len,
-						log_text(msg), msg->text_len);
-		}
-		console_idx = log_next(console_idx);
-		console_seq++;
+		/* No warning and no visible message => done */
+		if (!len && !printed_msg)
+			break;
+
 		raw_spin_unlock(&logbuf_lock);
 
 		stop_critical_timings();	/* don't trace print latency */
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-03 14:55   ` Petr Mladek
@ 2017-01-03 15:47     ` Sergey Senozhatsky
  2017-01-03 16:53       ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: Sergey Senozhatsky @ 2017-01-03 15:47 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Steven Rostedt, Peter Hurley,
	linux-kernel, Sergey Senozhatsky

On (01/03/17 15:55), Petr Mladek wrote:
[..]
> This causes the opposite problem. We might print a message that was supposed
> to be suppressed.

so what? yes, we print a message that otherwise would have been suppressed.
not a big deal. at all. we are under high printk load and the best thing
we can do is to report "we are losing the messages" straight ahead. the
next 'visible' message may be seconds/minutes/forever away. think of a
printk() flood of messages with suppressed loglevel coming from CPUA-CPUX,
big enough to drain all 'visible' loglevel messages from CPUZ. we are
back to problem "a".

thus I want a simple bool flag and a simple rule: we see something - we say it.


[..]
> The best solution seems to be to print the warning with
> the next visible message.

not sure.

	-ss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-03 15:47     ` Sergey Senozhatsky
@ 2017-01-03 16:53       ` Petr Mladek
  2017-01-04  2:46         ` Sergey Senozhatsky
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2017-01-03 16:53 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Steven Rostedt, Peter Hurley, linux-kernel,
	Sergey Senozhatsky

On Wed 2017-01-04 00:47:45, Sergey Senozhatsky wrote:
> On (01/03/17 15:55), Petr Mladek wrote:
> [..]
> > This causes the opposite problem. We might print a message that was supposed
> > to be suppressed.
> 
> so what? yes, we print a message that otherwise would have been suppressed.
> not a big deal. at all. we are under high printk load and the best thing
> we can do is to report "we are losing the messages" straight ahead. the
> next 'visible' message may be seconds/minutes/forever away. think of a
> printk() flood of messages with suppressed loglevel coming from CPUA-CPUX,
> big enough to drain all 'visible' loglevel messages from CPUZ. we are
> back to problem "a".
> 
> thus I want a simple bool flag and a simple rule: we see something - we say it.

So, you prefer to print some random debug message instead of an
emergency one? The console_level is there for a reason.

If there is a flood of messages, console_level = 1 and use your
solution, you might see:

** 1324 printk messages dropped ** <notice: random message>
** 4234 printk messages dropped ** <debug: random message>
** 3243 printk messages dropped ** <info: random message>
** 2343 printk messages dropped ** <debug: random message>

It will always drop a message because you always process only one
and many new appear in the meantime. While with my solution,
you should see:

** 1324 printk messages dropped ** <alert: random message>
** 523 printk messages dropped ** <emerg: random message>
** 324 printk messages dropped ** <emerg: random message>
** 345 printk messages dropped ** <alert: random message>

You will see messages filtered by the console_level. Also
less number of messages should get dropped because you quickly
skip the less important ones.

The filtering might be the only way to see the important
messages. On the other hand, the fact that messages are
dropped might be quessed from the context.

Your patch fixes a bug by introducing another bug that is
probably even more serious.


I understand that my patch is much more complex. But is
the final code realy more complex?

console_unlock() is too long. The helper for the msg_print()
calls makes sense on its own. The rest is just reshufling
of the conditions.

It actually would make sense to hide also the increment
of console_idx/console_seq into the helper function.
Especially console_idx is related to the struct printk_log
that is no longer accessed in console_unlock() directly.

Here is v2:

>From 82ff26726508c8a5a97409c7270d1f7abc7ed740 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Tue, 3 Jan 2017 14:32:04 +0100
Subject: [PATCH v2] printk: Always report lost messages on serial console

The "printk messages dropped" report is 'attached' to a kernel
message located at console_idx offset. This does not work well
if we skip that message due to loglevel filtering, because in
this case we also skip/lose dropped message report.

A simple solution would be to ignore the level and always print
the warning with the next message.

But the situation suggests that we are under a high load and
could not afford printing less important messages.

Also we could not print only the warning because we might lose
even more messages in the meantime.

The best solution seems to be to print the warning with
the next visible message.

This patch tries to keep readability of the code. It puts
msg_print*() calls into a helper function. Also it hides there
the visibility check and increment of console_idx/console_seq.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 70 +++++++++++++++++++++++++++++---------------------
 1 file changed, 41 insertions(+), 29 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0dbde4e7bb15..ae80dc3284dc 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2161,6 +2161,35 @@ static inline int can_use_console(void)
 	return cpu_online(raw_smp_processor_id()) || have_callable_console();
 }
 
+static bool console_check_and_print_next_msg(
+		char *text, size_t size, size_t *len,
+		char *ext_text, size_t ext_size, size_t *ext_len)
+{
+	struct printk_log *msg = log_from_idx(console_idx);
+	bool ret = true;
+
+	if (suppress_message_printing(msg->level)) {
+		ret = false;
+		goto out;
+	}
+
+	*len += msg_print_text(msg, false, text, size);
+	if (nr_ext_console_drivers) {
+		*ext_len = msg_print_ext_header(ext_text, ext_size,
+						msg, console_seq);
+		*ext_len += msg_print_ext_body(ext_text + *ext_len,
+					       ext_size - *ext_len,
+					       log_dict(msg), msg->dict_len,
+					       log_text(msg), msg->text_len);
+	}
+
+out:
+	console_idx = log_next(console_idx);
+	console_seq++;
+
+	return ret;
+}
+
 /**
  * console_unlock - unlock the console system
  *
@@ -2215,9 +2244,9 @@ void console_unlock(void)
 	}
 
 	for (;;) {
-		struct printk_log *msg;
+		bool visible_msg = false;
 		size_t ext_len = 0;
-		size_t len;
+		size_t len = 0;
 
 		raw_spin_lock_irqsave(&logbuf_lock, flags);
 		if (seen_seq != log_next_seq) {
@@ -2232,37 +2261,20 @@ void console_unlock(void)
 			/* messages are gone, move to first one */
 			console_seq = log_first_seq;
 			console_idx = log_first_idx;
-		} else {
-			len = 0;
 		}
-skip:
-		if (console_seq == log_next_seq)
-			break;
 
-		msg = log_from_idx(console_idx);
-		if (suppress_message_printing(msg->level)) {
-			/*
-			 * Skip record we have buffered and already printed
-			 * directly to the console when we received it, and
-			 * record that has level above the console loglevel.
-			 */
-			console_idx = log_next(console_idx);
-			console_seq++;
-			goto skip;
-		}
+		/* Get the next message with a visible level */
+		while (console_seq < log_next_seq && !visible_msg) {
+			visible_msg = console_check_and_print_next_msg(
+					text + len, sizeof(text) - len, &len,
+					ext_text, sizeof(ext_text), &ext_len);
 
-		len += msg_print_text(msg, false, text + len, sizeof(text) - len);
-		if (nr_ext_console_drivers) {
-			ext_len = msg_print_ext_header(ext_text,
-						sizeof(ext_text),
-						msg, console_seq);
-			ext_len += msg_print_ext_body(ext_text + ext_len,
-						sizeof(ext_text) - ext_len,
-						log_dict(msg), msg->dict_len,
-						log_text(msg), msg->text_len);
 		}
-		console_idx = log_next(console_idx);
-		console_seq++;
+
+		/* No warning and no visible message => done */
+		if (!len && !visible_msg)
+			break;
+
 		raw_spin_unlock(&logbuf_lock);
 
 		stop_critical_timings();	/* don't trace print latency */
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-03 16:53       ` Petr Mladek
@ 2017-01-04  2:46         ` Sergey Senozhatsky
  2017-01-04 10:52           ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: Sergey Senozhatsky @ 2017-01-04  2:46 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Steven Rostedt, Peter Hurley,
	linux-kernel, Sergey Senozhatsky

On (01/03/17 17:53), Petr Mladek wrote:
> On Wed 2017-01-04 00:47:45, Sergey Senozhatsky wrote:
> > On (01/03/17 15:55), Petr Mladek wrote:
> > [..]
> > > This causes the opposite problem. We might print a message that was supposed
> > > to be suppressed.
> > 
> > so what? yes, we print a message that otherwise would have been suppressed.
> > not a big deal. at all. we are under high printk load and the best thing
> > we can do is to report "we are losing the messages" straight ahead. the
> > next 'visible' message may be seconds/minutes/forever away. think of a
> > printk() flood of messages with suppressed loglevel coming from CPUA-CPUX,
> > big enough to drain all 'visible' loglevel messages from CPUZ. we are
> > back to problem "a".
> > 
> > thus I want a simple bool flag and a simple rule: we see something - we say it.
> 
> So, you prefer to print some random debug message instead of an
> emergency one?

yep. because console_unlock() is not the right place to address printk()
abuse, and console_unlock() cannot address it. the only way to fix it is
to reduce the amount of printk() calls (well, there is one more thing
probably and may be but not really *).


> It will always drop a message because you always process only one
> and many new appear in the meantime. While with my solution,
> you should see:
> 
> ** 1324 printk messages dropped ** <alert: random message>
> ** 523 printk messages dropped ** <emerg: random message>
> ** 324 printk messages dropped ** <emerg: random message>
> ** 345 printk messages dropped ** <alert: random message>

once you started losing the messages because of printk() flood you will
lose them no matter what you do. console_unlock() cannot force printk() CPUs
to add less messages to the logbuf, so we magically can have enough time to
print the message on the consoles. any call to console drivers leads to lost
messages. there is no difference. 

this is from the real serial logs I'm looking at right now. we attach
"bad news" to 'critical' messages only:

...
[   32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
** 150 printk messages dropped ** [   32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
** 75 printk messages dropped ** [   32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
** 12 printk messages dropped ** [   32.941940] ..
** 2 printk messages dropped ** [   32.941951] ..
** 10 printk messages dropped ** [   32.941992] ..
** 1 printk messages dropped ** [   32.941999] ..
...

note how many critical messages I lost in consecutive console drivers calls
because console_unlock() was printing other critical messages.
no matter what we filter-out in console_unlock() we are still far-far-far
behind the CPUs that flood the printk buffer. and those CPUs will be happy
to drain any critical/visible message from the logbuf.




* may be can do something like this:

two logbuf buffers. one for `everything' -- both suppressed and visible
loglevels. this one is also used by dmesg. the other one if for messages
that won't be suppressed. we call_console_drivers() on that buffer.
so vprintk_emit() becomes

vprintk_emit()
{
	spin_lock logbuf_lock

	text = sprintf(...)

	log_store(logbuf)
	if (!suppress_message_printing(level))
		log_store(printing_lofbuf)

	spin_ulock logbuf_lock
}

and console_unlock() reads printing_logbuf only. but we still can lose
messages even from filtered printing_logbuf.

	-ss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-04  2:46         ` Sergey Senozhatsky
@ 2017-01-04 10:52           ` Petr Mladek
  2017-01-04 13:34             ` Sergey Senozhatsky
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2017-01-04 10:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Steven Rostedt, Peter Hurley,
	linux-kernel

On Wed 2017-01-04 11:46:50, Sergey Senozhatsky wrote:
> On (01/03/17 17:53), Petr Mladek wrote:
> > On Wed 2017-01-04 00:47:45, Sergey Senozhatsky wrote:
> > > On (01/03/17 15:55), Petr Mladek wrote:
> > > [..]
> > > > This causes the opposite problem. We might print a message that was supposed
> > > > to be suppressed.
> > > 
> > > so what? yes, we print a message that otherwise would have been suppressed.
> > > not a big deal. at all. we are under high printk load and the best thing
> > > we can do is to report "we are losing the messages" straight ahead. the
> > > next 'visible' message may be seconds/minutes/forever away. think of a
> > > printk() flood of messages with suppressed loglevel coming from CPUA-CPUX,
> > > big enough to drain all 'visible' loglevel messages from CPUZ. we are
> > > back to problem "a".
> > > 
> > > thus I want a simple bool flag and a simple rule: we see something - we say it.
> > 
> > So, you prefer to print some random debug message instead of an
> > emergency one?
> 
> yep. because console_unlock() is not the right place to address printk()
> abuse, and console_unlock() cannot address it. the only way to fix it is
> to reduce the amount of printk() calls (well, there is one more thing
> probably and may be but not really *).

Yes and no. Please note that writing into logbug is fast while
writing to the serial console is slow. This is why we have
console_level. It allows to filter only the critical messages
for the slow output.


> > It will always drop a message because you always process only one
> > and many new appear in the meantime. While with my solution,
> > you should see:
> > 
> > ** 1324 printk messages dropped ** <alert: random message>
> > ** 523 printk messages dropped ** <emerg: random message>
> > ** 324 printk messages dropped ** <emerg: random message>
> > ** 345 printk messages dropped ** <alert: random message>
> 
> once you started losing the messages because of printk() flood you will
> lose them no matter what you do. console_unlock() cannot force printk() CPUs
> to add less messages to the logbuf, so we magically can have enough time to
> print the message on the consoles. any call to console drivers leads to lost
 > messages. there is no difference.
> 
> this is from the real serial logs I'm looking at right now. we attach
> "bad news" to 'critical' messages only:
> 
> ...
> [   32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
> ** 150 printk messages dropped ** [   32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
> ** 75 printk messages dropped ** [   32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
> ** 12 printk messages dropped ** [   32.941940] ..
> ** 2 printk messages dropped ** [   32.941951] ..
> ** 10 printk messages dropped ** [   32.941992] ..
> ** 1 printk messages dropped ** [   32.941999] ..
> ...

Do you see how useless the above messages are, please?
Are they really printed with KERN_EMERG or KERN_ALERT prefix?

IMHO, the point of the log levels and console_level is
to have a chance to see the more informative messages
on the slow medium (serial console).

BTW: It is questionable if messages with LOGLEVEL_DEFAULT are
always printed but this is another story.


> note how many critical messages I lost in consecutive console drivers calls
> because console_unlock() was printing other critical messages.
> no matter what we filter-out in console_unlock() we are still far-far-far
> behind the CPUs that flood the printk buffer. and those CPUs will be happy
> to drain any critical/visible message from the logbuf.
> 
> * may be can do something like this:
> 
> two logbuf buffers. one for `everything' -- both suppressed and visible
> loglevels. this one is also used by dmesg. the other one if for messages
> that won't be suppressed. we call_console_drivers() on that buffer.
> so vprintk_emit() becomes
> 
> vprintk_emit()
> {
> 	spin_lock logbuf_lock
> 
> 	text = sprintf(...)
> 
> 	log_store(logbuf)
> 	if (!suppress_message_printing(level))
> 		log_store(printing_lofbuf)
> 
> 	spin_ulock logbuf_lock
> }
> 
> and console_unlock() reads printing_logbuf only. but we still can lose
> messages even from filtered printing_logbuf.

Please, do not do this. IMHO, it will not improve the situation much.
Let me repeat. Writing into logbuf and filtering messages is
relatively fast. The slow thing is printing to the serial console.

Also writing and filtering is done under logbuf_lock
which blocks other writers. This is a natural throttling of
other writers. On the other hand, the console handling is done
without logbuf_lock and other writers might do anything in the
meantime.

It might make sense to allow filtering messages that are
stored to the logbuffer. But this is orthogonal to the
console output filtering.

Also we should improve the API to make it easier to throotle
the same messages. Or allow to throotle all messages.

Anyway, your patch breaks console output filtering. This is
why I am against it and propose another solution.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-04 10:52           ` Petr Mladek
@ 2017-01-04 13:34             ` Sergey Senozhatsky
  2017-01-04 15:26               ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: Sergey Senozhatsky @ 2017-01-04 13:34 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Steven Rostedt, Peter Hurley, linux-kernel

On (01/04/17 11:52), Petr Mladek wrote:
[..]
> > this is from the real serial logs I'm looking at right now. we attach
> > "bad news" to 'critical' messages only:
> > 
> > ...
> > [   32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
> > ** 150 printk messages dropped ** [   32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
> > ** 75 printk messages dropped ** [   32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
> > ** 12 printk messages dropped ** [   32.941940] ..
> > ** 2 printk messages dropped ** [   32.941951] ..
> > ** 10 printk messages dropped ** [   32.941992] ..
> > ** 1 printk messages dropped ** [   32.941999] ..
> > ...
> 
> Do you see how useless the above messages are, please?

what... these lost messages were of extreme importance. I can't tell
the exactly the loglevel, but I'm sure it was at least pr_err() level.
these were like really important messages, unlike the ones that got
suppressed/filtered-out.

along with these lost messages that were supposed to be printed, I have
regions of lost kernel messages (?) with no reports of lost messages from
console_unlock() (!). and that's the only thing I'm fixing here. and the
only thing we can fix. no permutation of console_unlock() lines will make
the buffer bigger or printk flooding CPUs nicer or serial console driver
faster.

once we unlocked the logbuf lock in console_unlock() we lost the race
against the printk() flooding CPUs. all, or some, of the remaining
messages, no matter how important and critical, will be drained. the
next time we lock the logbuf again the logbuf will not be the same.
the only meaningful thing we can print now is "XXX printk messages
dropped". we don't know what was in those messages and no one will ever
find out. it's gone.

and the options here are
	"print 1 random message out of XXX or XXXX lost messages"
vs
	"print 1 random message out of XXX or XXXX lost messages"

and that's really it. except that in the second case we shuffle
console_unlock() lines with no gain. and I'm not sure I understand
why we even discuss it. both options are absolutely equally terrible,
because the root cause of the problem is in completely different
place.

	-ss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-04 13:34             ` Sergey Senozhatsky
@ 2017-01-04 15:26               ` Petr Mladek
  2017-01-05  2:30                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2017-01-04 15:26 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Steven Rostedt, Peter Hurley,
	linux-kernel

On Wed 2017-01-04 22:34:48, Sergey Senozhatsky wrote:
> On (01/04/17 11:52), Petr Mladek wrote:
> [..]
> > > this is from the real serial logs I'm looking at right now. we attach
> > > "bad news" to 'critical' messages only:
> > > 
> > > ...
> > > [   32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
> > > ** 150 printk messages dropped ** [   32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
> > > ** 75 printk messages dropped ** [   32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
> > > ** 12 printk messages dropped ** [   32.941940] ..
> > > ** 2 printk messages dropped ** [   32.941951] ..
> > > ** 10 printk messages dropped ** [   32.941992] ..
> > > ** 1 printk messages dropped ** [   32.941999] ..
> > > ...

OK, it is possible that I miss-interpreted the message. It looked like
a random memory dump that did not make sense without a context.


> > Do you see how useless the above messages are, please?
> 
> what... these lost messages were of extreme importance. I can't tell
> the exactly the loglevel, but I'm sure it was at least pr_err() level.
> these were like really important messages, unlike the ones that got
> suppressed/filtered-out.

It means that you were lucky and you saw critical messages instead
of some random debugging ones.


> along with these lost messages that were supposed to be printed, I have
> regions of lost kernel messages (?) with no reports of lost messages from
> console_unlock() (!). and that's the only thing I'm fixing here.

My patch fixes it as well. But it also keeps the function of
console_level filtering.


> and the
> only thing we can fix. no permutation of console_unlock() lines will make
> the buffer bigger or printk flooding CPUs nicer or serial console driver
> faster.

We should stay on a constructive note. I never wrote that my patch
would make the buffer bigger or the serial console faster.


> once we unlocked the logbuf lock in console_unlock() we lost the race
> against the printk() flooding CPUs.

My patch did not have ambition to solve this problem.


> and the options here are
> 	"print 1 random message out of XXX or XXXX lost messages"
> vs
> 	"print 1 random message out of XXX or XXXX lost messages"

And this is not fully correct and probably the root of
the misunderstanding. The difference between your patch
and mine patch is:

	"always print '%u printk messages dropped'" +
	"print 1 random message out of XXX or XXXX lost messages"

vs

	"always print '%u printk messages dropped'" +
	"print 1 random message with level under console_level
	 out of XXX or XXXX lost messages"

and that's it. I am sorry if I was not able to explain this
a more clear way.

I think that we both should take a deep breath and calm down
a bit. I am afraid that I used some formulations that made
you angry and put us in an offensive mode. Maybe I was not
able to clearly describe my concerns and their severity.
Maybe you feel offended because I produced an alternative
patch and did not keep enough credits to you. I am sorry
for this. I will try better next time.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-04 15:26               ` Petr Mladek
@ 2017-01-05  2:30                 ` Sergey Senozhatsky
  2017-01-05 10:53                   ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: Sergey Senozhatsky @ 2017-01-05  2:30 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Steven Rostedt, Peter Hurley, linux-kernel

On (01/04/17 16:26), Petr Mladek wrote:
> On Wed 2017-01-04 22:34:48, Sergey Senozhatsky wrote:
> > On (01/04/17 11:52), Petr Mladek wrote:
> > [..]
> > > > this is from the real serial logs I'm looking at right now. we attach
> > > > "bad news" to 'critical' messages only:
> > > > 
> > > > ...
> > > > [   32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
> > > > ** 150 printk messages dropped ** [   32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
> > > > ** 75 printk messages dropped ** [   32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
> > > > ** 12 printk messages dropped ** [   32.941940] ..
> > > > ** 2 printk messages dropped ** [   32.941951] ..
> > > > ** 10 printk messages dropped ** [   32.941992] ..
> > > > ** 1 printk messages dropped ** [   32.941999] ..
> > > > ...
> 
> OK, it is possible that I miss-interpreted the message. It looked like
> a random memory dump that did not make sense without a context.
> 
> 
> > > Do you see how useless the above messages are, please?
> > 
> > what... these lost messages were of extreme importance. I can't tell
> > the exactly the loglevel, but I'm sure it was at least pr_err() level.
> > these were like really important messages, unlike the ones that got
> > suppressed/filtered-out.
> 
> It means that you were lucky and you saw critical messages instead
> of some random debugging ones.

this is funny. ok... let me tell you my version.
I saw an incomplete serial log with lost important messages. the serial
log was nothing but garbage. zero value. I could simply `rm screenlog.0'
it. end of story.

and no matter what loglevel we attach the "printk messages dropped"
to we always will lose XYZ important messages in the given circumstances.
and that "let's print 1 out of thousands lost critical messages, so the
serial log will make sense" is a little bit far from being true. sorry,
it is what it is.


> > and the options here are
> > 	"print 1 random message out of XXX or XXXX lost messages"
> > vs
> > 	"print 1 random message out of XXX or XXXX lost messages"
> 
> And this is not fully correct and probably the root of
> the misunderstanding. The difference between your patch
> and mine patch is:
> 
> 	"always print '%u printk messages dropped'" +
> 	"print 1 random message out of XXX or XXXX lost messages"
> 
> vs
> 
> 	"always print '%u printk messages dropped'" +
> 	"print 1 random message with level under console_level
> 	 out of XXX or XXXX lost messages"
> 
> and that's it. I am sorry if I was not able to explain this
> a more clear way.


... I understand what your patch is doing. see the serial log atop of
this message, this is from the 'attach "printk messages dropped to a
visible loglevel"' approach. what I don't understand is why do you
claim that it produces significantly more meaningful/useful serial logs.
because it does not. it produces the 'rm screenlog.0' material. we
can't protect/save/take care of/whatever the logbuf messages once we
unlock the logbuf lock. and the point is - for a guy who reads the
incomplete serial log 'print 1 random message out of XXX or XXXX lost
messages' is pretty much the same as 'print 1 random message of visible
loglevel out of XXX or XXXX lost messages'. because the really important
part here is 'you see 1 message out of XXXX', and there is no way to
reconstruct those XXXX lost messages, no matter how small the XXXX is:

[   x.xxxx] Call Trace:
** 9 printk messages dropped ** [   x.xxxx] ---[ end trace ]---

	-ss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-05  2:30                 ` Sergey Senozhatsky
@ 2017-01-05 10:53                   ` Petr Mladek
  0 siblings, 0 replies; 18+ messages in thread
From: Petr Mladek @ 2017-01-05 10:53 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Steven Rostedt, Peter Hurley,
	linux-kernel

On Thu 2017-01-05 11:30:47, Sergey Senozhatsky wrote:
> On (01/04/17 16:26), Petr Mladek wrote:
> > On Wed 2017-01-04 22:34:48, Sergey Senozhatsky wrote:
> > > On (01/04/17 11:52), Petr Mladek wrote:
> > > [..]
> > > > > this is from the real serial logs I'm looking at right now. we attach
> > > > > "bad news" to 'critical' messages only:
> > > > > 
> > > > > ...
> > > > > [   32.941061] bc00: b65dc0d8 b65dc6d0 ae1fbc7c b65c11c5 b563c9c4 00000001 b65dc6d0 b0a62f74
> > > > > ** 150 printk messages dropped ** [   32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
> > > > > ** 75 printk messages dropped ** [   32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000
> > > > > ** 12 printk messages dropped ** [   32.941940] ..
> > > > > ** 2 printk messages dropped ** [   32.941951] ..
> > > > > ** 10 printk messages dropped ** [   32.941992] ..
> > > > > ** 1 printk messages dropped ** [   32.941999] ..
> > > > > ...

Please, be more explicit. What exactly was of extreme imporance
in the output above?

Was it the information about lost messages?

    ** XXX printk messages dropped **

Or was is some of the data?

    [   32.941614] cee0: 00000081 ae1fcef0 00000038 00000000 b5369000 ae1fcf00 ae1fd0f0 00000000
    [   32.941892] d860: 00056608 af203848 00000000 0004088c 000000d0 00000000 00000000 00000000


> > OK, it is possible that I miss-interpreted the message. It looked like
> > a random memory dump that did not make sense without a context.
> 
> this is funny. ok... let me tell you my version.
> I saw an incomplete serial log with lost important messages. the serial
> log was nothing but garbage. zero value. I could simply `rm screenlog.0'
> it. end of story.
> 
> and no matter what loglevel we attach the "printk messages dropped"
> to we always will lose XYZ important messages in the given circumstances.
> and that "let's print 1 out of thousands lost critical messages, so the
> serial log will make sense" is a little bit far from being true. sorry,
> it is what it is.

I never claimed that you would see more messages with my patch.
I claimed that you should see messages with the requested severity
level with my patch.

If you get better results with random messages than I see two
possibilities:

  + some/many messages have assigned wrong level

  + the message levels are useless in general


> > > and the options here are
> > > 	"print 1 random message out of XXX or XXXX lost messages"
> > > vs
> > > 	"print 1 random message out of XXX or XXXX lost messages"
> > 
> > And this is not fully correct and probably the root of
> > the misunderstanding. The difference between your patch
> > and mine patch is:
> > 
> > 	"always print '%u printk messages dropped'" +
> > 	"print 1 random message out of XXX or XXXX lost messages"
> > 
> > vs
> > 
> > 	"always print '%u printk messages dropped'" +
> > 	"print 1 random message with level under console_level
> > 	 out of XXX or XXXX lost messages"
> > 
> > and that's it. I am sorry if I was not able to explain this
> > a more clear way.
> 
> 
> ... I understand what your patch is doing. see the serial log atop of
> this message, this is from the 'attach "printk messages dropped to a
> visible loglevel"' approach.

atop? I am lost. There is only one real-life output atop of this
message and you claimed that it was of extreme importance. I thought
that you got it with your patch.


> what I don't understand is why do you
> claim that it produces significantly more meaningful/useful serial logs.
> because it does not. it produces the 'rm screenlog.0' material. we
> can't protect/save/take care of/whatever the logbuf messages once we
> unlock the logbuf lock. and the point is - for a guy who reads the
> incomplete serial log 'print 1 random message out of XXX or XXXX lost
> messages' is pretty much the same as 'print 1 random message of visible
> loglevel out of XXX or XXXX lost messages'. because the really important
> part here is 'you see 1 message out of XXXX', and there is no way to
> reconstruct those XXXX lost messages, no matter how small the XXXX is:

If it does not matter what message we print then why do we have
the log levels in the first place?


> [   x.xxxx] Call Trace:
> ** 9 printk messages dropped ** [   x.xxxx] ---[ end trace ]---

This is unfortunate. This line does not include any useful
information but it is part of an important blob of lines.

A solution might be to proceed more lines of the same level
at once. One line usually is not enough. We will still get
only random blobs of lines but at least some of them should
be more usable than single random lines.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] printk: drop call_console_drivers() unused param
  2017-01-03 10:22   ` Petr Mladek
@ 2017-01-09 13:49     ` Petr Mladek
  0 siblings, 0 replies; 18+ messages in thread
From: Petr Mladek @ 2017-01-09 13:49 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Steven Rostedt, Peter Hurley, linux-kernel,
	Sergey Senozhatsky

On Tue 2017-01-03 11:22:30, Petr Mladek wrote:
> On Sat 2016-12-24 23:09:01, Sergey Senozhatsky wrote:
> > We do suppress_message_printing() check before we call
> > call_console_drivers() now, so `level' param is not needed
> > anymore.
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > ---
> >  kernel/printk/printk.c | 12 ++++--------
> >  1 file changed, 4 insertions(+), 8 deletions(-)
> > 
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index e2cdd87e7a63..11a9842a2f47 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1510,8 +1510,7 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
> >   * log_buf[start] to log_buf[end - 1].
> >   * The console_lock must be held.
> >   */
> > -static void call_console_drivers(int level,
> > -				 const char *ext_text, size_t ext_len,
> > +static void call_console_drivers(const char *ext_text, size_t ext_len,
> >  				 const char *text, size_t len)
> 
> Yup, this patch makes sense on its own. The level parameter is unused
> since the commit cf7754441c563230ed7 ("printk: introduce
> suppress_message_printing()").
> 
> Reviewed-by: Petr Mladek <pmladek@suse.com>

The patch makes sense on its own. I have pushed it into
the for-4.11 branch, see
https://git.kernel.org/cgit/linux/kernel/git/pmladek/printk.git/commit/?h=for-4.11&id=0e91b77060387275b6762c6688e3bddd1b808e2c

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2016-12-24 14:09 ` [PATCH 2/2] printk: always report lost messages on serial console Sergey Senozhatsky
  2017-01-03 14:55   ` Petr Mladek
@ 2017-01-09 16:56   ` Petr Mladek
  2017-01-10  8:49     ` Sergey Senozhatsky
  1 sibling, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2017-01-09 16:56 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Steven Rostedt, Peter Hurley, linux-kernel,
	Sergey Senozhatsky

On Sat 2016-12-24 23:09:02, Sergey Senozhatsky wrote:
> The "printk messages dropped" report is 'attached' to a kernel
> message located at console_idx offset. This does not work well
> if we skip that message due to loglevel filtering, because in
> this case we also skip/lose dropped message report.
> 
> Disable suppress_message_printing() loglevel filtering if we
> must report "printk messages dropped" condition.

I have looked at this once again. Please, do you have a real-life
example when the "printk messages dropped" was not printed?

I ask because the original code looks like this this:

		if (console_seq < log_first_seq) {
			<store warning into "text" buffer>
			console_seq = log_first_seq;
			console_idx = log_first_idx;

skip:
		if (console_seq == log_next_seq)
			break;

		msg = log_from_idx(console_idx);
		if (suppress_message_printing(msg->level)) {
			console_idx = log_next(console_idx);
			console_seq++;
			goto skip;
		}

		<add visible messsage into "text" buffer>

		console_idx = log_next(console_idx);
		console_seq++;
		raw_spin_unlock(&logbuf_lock);

		<push "text" buffer to console>


By other words, the search for a visible message is finished when
we find one or when the entire log buffer is checked.

As a consequence, the warning about "printk messages dropped" is
lost _only_ when there was no visible message in the entire buffer.

Therefore, the warning might get lost only when the log buffer
was very small or when console_level filter was too restricting.
I wonder if this is a case for a real user or a real life.
In each case, such a scenario most likely comes from a very
curious setup.

It is possible that your fix is fine. If we lose messages,
we are screwed anyway. But I still have problems to accept
that we would start printing less important messages (that would
normally be ignored) in situation when we have troubles
to print the more important ones. This logic rings warning
bells in my head and this is why I suggest more conservative
solution and ask the many questions.

Am I so wrong in my thinking, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-09 16:56   ` Petr Mladek
@ 2017-01-10  8:49     ` Sergey Senozhatsky
  2017-01-11 16:50       ` Petr Mladek
  0 siblings, 1 reply; 18+ messages in thread
From: Sergey Senozhatsky @ 2017-01-10  8:49 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Steven Rostedt, Peter Hurley,
	linux-kernel, Sergey Senozhatsky

On (01/09/17 17:56), Petr Mladek wrote:
> I have looked at this once again. Please, do you have a real-life
> example when the "printk messages dropped" was not printed?

.....


> By other words, the search for a visible message is finished when
> we find one or when the entire log buffer is checked.
> 
> As a consequence, the warning about "printk messages dropped" is
> lost _only_ when there was no visible message in the entire buffer.
> 
> Therefore, the warning might get lost only when the log buffer
> was very small or when console_level filter was too restricting.

with a slow serial console even a relatively large logbuf can
easily become a "not quite large enough" logbuf. especially
when the underlying uart_port ->lock is concurrently accessed
by uart circ_buf flushing.


> I wonder if this is a case for a real user or a real life.
> In each case, such a scenario most likely comes from a very
> curious setup.
> 
> It is possible that your fix is fine. If we lose messages,
> we are screwed anyway. But I still have problems to accept
> that we would start printing less important messages (that would
> normally be ignored) in situation when we have troubles
> to print the more important ones. This logic rings warning
> bells in my head and this is why I suggest more conservative
> solution and ask the many questions.

once the system is in "oh, let me drop some of the messages for you"
mood, loglevel filtering is unreliable and in some cases unneeded.
it's so unreliable that I'm even considering disabling it in *in-house*
builds when console_unlock() detects that there was no room for all
'yet to be seen' messages.


those are another messages, with 'visible' loglevel or with 'suppressed'
loglevel or both 'visible' and 'suppressed' loglevels, that caused the
logbuf overflow.

now, if the loss of messages was caused by:

a) flood of suppressed loglevel messages
   then printing at least some of those messages makes *a lot* of sense.
   it really does. because those messages will point at the root cause of
   the problem. and there is no other way of finding that out when all you
   have is a serial log of visible messages. those 'suppressed messages'
   part is the key. whatever loglevel it was, the fact that we overflowed
   the logbuf with those messages is already bad.



b) flood of visible loglevel messages
   then may be those messages are not so important. there a whole logbuf of
   them. per my experience, it is quite hard to overflow the logbuf with
   really important, unique, sensible messages of 'visible' loglevel with
   active loglevel filtering. in most of the cases I've seen, those "important"
   messages are the same repeating messages. OOM is the most notable example:
   it can flood logbuf with pretty much same backtraces and Mem-Info printouts.
   for instance - CPU that stuck in GFP_REPEAT allocation under OOM. or memory
   allocation error in swap out path under OOM:

   ...
   [3282.5292] Node    0, zone   XXXX, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 
   [3282.5292] Node    0, zone   XXXX, type    Unmovable      0      0      0      0      0      0      0      0      0      0      0 
   ** 1389 printk messages dropped ** [3282.5430] Write-error on swap-device (254:0:1471944)
   ** 354 printk messages dropped ** [3282.5526] Write-error on swap-device (254:0:1473240)
   ** 254 printk messages dropped ** [3282.5598] Write-error on swap-device (254:0:1474472)
   [3282.5598] SLUB: Unable to allocate memory on xxxxxxx
   		blah-blah-blah
   ...
   [3282.5606] Write-error on swap-device (254:0:1474480)
   [3282.5607] Write-error on swap-device (254:0:1474496)
   [3282.5607] Write-error on swap-device (254:0:1474504)
   [3282.5608] Write-error on swap-device (254:0:1474504)
   [3282.5610] Write-error on swap-device (254:0:1474528)
   [3282.5611] Write-error on swap-device (254:0:1474544)
   ** 519 printk messages dropped ** [3282.5611] Write-error on swap-device (254:0:1477152)
   ** 15 printk messages dropped ** [3282.5616] Write-error on swap-device (254:0:1477208)
   ...
   [3283.0532] zram: Error allocating memory for compressed page: 200385, size=2824
   ...

   those messages preempt each other and preempt other *may be actually
   important* 'visible' loglevel messages. but there is no way to tell
   which one of the messages vprintk_emit() is about to overwrite is actually
   important. the outcome is -- the important repeating messages are _mostly_
   as valuable as the suppressed ones. IOW, not every important message is
   really important under 'out of logbuf' condition. and vice versa.



c) both a) and b)
   printing both suppressible and 'visible' loglevels makes sense here.


considering a) and b):

once the system is out of logbuf space it is impossible to clearly
distinguish between 'important' and 'not so important' messages. all
we know in console_unlock(), when we pick up next_idx message, is that
there is an abnormal/unusual/weird/unexpected/sick/whatever amount of
messages - 'suppressed' or 'visible' or both. and that's the problem.

"print as much of those messages as possible, don't hide the problem" - that's
why I'm thinking of no loglevel filtering in 'ouf of logbuf space' case.

	-ss

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-10  8:49     ` Sergey Senozhatsky
@ 2017-01-11 16:50       ` Petr Mladek
  2017-01-13  5:11         ` Sergey Senozhatsky
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Mladek @ 2017-01-11 16:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Steven Rostedt, Peter Hurley,
	linux-kernel

Hi Sergey,

first, thanks a lot for the detailed description. I have finally
understood what was important on the "non-important" messages
and how you used them. I am sorry that I was not able to get
it earlier.

On Tue 2017-01-10 17:49:39, Sergey Senozhatsky wrote:
> On (01/09/17 17:56), Petr Mladek wrote:
> > It is possible that your fix is fine. If we lose messages,
> > we are screwed anyway. But I still have problems to accept
> > that we would start printing less important messages (that would
> > normally be ignored) in situation when we have troubles
> > to print the more important ones. This logic rings warning
> > bells in my head and this is why I suggest more conservative
> > solution and ask the many questions.
>
> once the system is in "oh, let me drop some of the messages for you"
> mood, loglevel filtering is unreliable and in some cases unneeded.
> it's so unreliable that I'm even considering disabling it in *in-house*
> builds when console_unlock() detects that there was no room for all
> 'yet to be seen' messages.
> 
> those are another messages, with 'visible' loglevel or with 'suppressed'
> loglevel or both 'visible' and 'suppressed' loglevels, that caused the
> logbuf overflow.
> 
> now, if the loss of messages was caused by:
> 
> a) flood of suppressed loglevel messages
>    then printing at least some of those messages makes *a lot* of sense.
> 
> b) flood of visible loglevel messages
>    then may be those messages are not so important. there a whole logbuf of
>    them. per my experience, it is quite hard to overflow the logbuf with
>    really important, unique, sensible messages of 'visible' loglevel with
>    active loglevel filtering.

Just for record, I guess that the same is true also for the messages
with lower level. I mean that they are repeating as well. It would be
great to make it easier to throttle the same messages or do it a
generic way. But this a food for the future work.

> once the system is out of logbuf space it is impossible to clearly
> distinguish between 'important' and 'not so important' messages. all
> we know in console_unlock(), when we pick up next_idx message, is that
> there is an abnormal/unusual/weird/unexpected/sick/whatever amount of
> messages - 'suppressed' or 'visible' or both. and that's the problem.

It is true that lost messages is a "serious" problem because you might
miss message about a "really" serious problem. The normally important
messages are less useful because they are incomplete. It makes sense
to debug what causes the flood. The key is to ignore loglevel and
print what is being stored.

Your patch makes perfect sense from this point of view. Please,
mention such an explanation in the next iteration of the patch.


Ah, you will kill me. I still have one thing. The levels are defined
like this:

#define KERN_EMERG	KERN_SOH "0"	/* system is unusable */
#define KERN_ALERT	KERN_SOH "1"	/* action must be taken immediately */
#define KERN_CRIT	KERN_SOH "2"	/* critical conditions */
#define KERN_ERR	KERN_SOH "3"	/* error conditions */

The flood of messages usually means something pretty wrong. But
it might also be caused by too many or forgotten debug messages.

It think that lost messages belong to the level "2". Note that
the warning about lost NMI messages and recent printk recursion
were printed with loglevel '2' as well.

Would it make sense and be acceptable to ignore the log level
only when console_level allows to show KERN_CRIT messages?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] printk: always report lost messages on serial console
  2017-01-11 16:50       ` Petr Mladek
@ 2017-01-13  5:11         ` Sergey Senozhatsky
  0 siblings, 0 replies; 18+ messages in thread
From: Sergey Senozhatsky @ 2017-01-13  5:11 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Steven Rostedt, Peter Hurley, linux-kernel

Hi,

On (01/11/17 17:50), Petr Mladek wrote:
> Hi Sergey,
> 
> first, thanks a lot for the detailed description. I have finally
> understood what was important on the "non-important" messages
> and how you used them. I am sorry that I was not able to get
> it earlier.

sure, no prob. I was mostly offline for the past few days for personal
reasons but now I'm back.

[..]
> > now, if the loss of messages was caused by:
> > 
> > a) flood of suppressed loglevel messages
> >    then printing at least some of those messages makes *a lot* of sense.
> > 
> > b) flood of visible loglevel messages
> >    then may be those messages are not so important. there a whole logbuf of
> >    them. per my experience, it is quite hard to overflow the logbuf with
> >    really important, unique, sensible messages of 'visible' loglevel with
> >    active loglevel filtering.
> 
> Just for record, I guess that the same is true also for the messages
> with lower level. I mean that they are repeating as well.

right. those are 100% reproducible, quick to spot and easy to fix, I
guess. a spontaneous explosion is a different/bigger problem. though
the end result is the same -- we lose messages, may be very important
ones. our best effort/goal is to print logbuf content. that's why we
are playing with nmi/safe printk; zap locks; ignore locks state in
some cases; and so on and on. but when we lose messages that were
meant to be printed even _before_ we try to print them out, then our
best effort is sort of void/undefined.


> It would be great to make it easier to throttle the same messages or
> do it a generic way. But this a food for the future work.

yes. syslog tracks "duplicate messages". but I kinda couldn't understand
how helpful it will be in vprintk_emit() /* because it's too late to track
duplicates in console_unlock(). duplicates should not be stored in multiple
instances in the first place */. backtraces are hard to suppress, besides we
shouldn't suppress backtraces I think. log_store() would have to strcmp() or
"hash+compare hashes" current message and the most recent logbuf message.
but bigger concern is -- do people see dropped messages that often to add
duplicate messages tracker to vprintk_emit()? I see dropped messages quite
a lot, but that's just my setup.


> #define KERN_EMERG	KERN_SOH "0"	/* system is unusable */
> #define KERN_ALERT	KERN_SOH "1"	/* action must be taken immediately */
> #define KERN_CRIT	KERN_SOH "2"	/* critical conditions */
> #define KERN_ERR	KERN_SOH "3"	/* error conditions */
> 
> The flood of messages usually means something pretty wrong. But
> it might also be caused by too many or forgotten debug messages.

well. it's still really a lot of forgotten messages. so much that
we have to drop other messages. so I'd say the root cause is less
important (if important at all) as long as the result is "lost messages".


> It think that lost messages belong to the level "2". Note that
> the warning about lost NMI messages and recent printk recursion
> were printed with loglevel '2' as well.
> 
> Would it make sense and be acceptable to ignore the log level
> only when console_level allows to show KERN_CRIT messages?

need to think. what will it improve?


// I'm catching up with the emails, it'll take some time.

	-ss

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-01-13  5:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-24 14:09 [PATCH 0/2] printk: always report dropped messages Sergey Senozhatsky
2016-12-24 14:09 ` [PATCH 1/2] printk: drop call_console_drivers() unused param Sergey Senozhatsky
2017-01-03 10:22   ` Petr Mladek
2017-01-09 13:49     ` Petr Mladek
2016-12-24 14:09 ` [PATCH 2/2] printk: always report lost messages on serial console Sergey Senozhatsky
2017-01-03 14:55   ` Petr Mladek
2017-01-03 15:47     ` Sergey Senozhatsky
2017-01-03 16:53       ` Petr Mladek
2017-01-04  2:46         ` Sergey Senozhatsky
2017-01-04 10:52           ` Petr Mladek
2017-01-04 13:34             ` Sergey Senozhatsky
2017-01-04 15:26               ` Petr Mladek
2017-01-05  2:30                 ` Sergey Senozhatsky
2017-01-05 10:53                   ` Petr Mladek
2017-01-09 16:56   ` Petr Mladek
2017-01-10  8:49     ` Sergey Senozhatsky
2017-01-11 16:50       ` Petr Mladek
2017-01-13  5:11         ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).