All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] printk: Add best-effort printk() buffering.
@ 2017-04-30 13:54 Tetsuo Handa
  2017-04-30 16:11 ` Joe Perches
  2017-05-08  7:05 ` Sergey Senozhatsky
  0 siblings, 2 replies; 11+ messages in thread
From: Tetsuo Handa @ 2017-04-30 13:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tetsuo Handa, Ingo Molnar, Peter Zijlstra, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt

Sometimes we want to printk() multiple lines in a group without being
disturbed by concurrent printk() from interrupts and/or other threads.
For example, mixed printk() output of multiple thread's dump makes it
hard to interpret.

This patch introduces fixed-sized statically allocated buffers for
buffering printk() output for each thread/context in best effort
(i.e. up to PAGE_SIZE bytes, up to 16 concurrent printk() users).

Typical thread dumps are likely less than PAGE_SIZE bytes. If they
exceeded PAGE_SIZE bytes, they will be printed as multiple groups
by trying to group as many lines as possible.

If there happened to be more than 16 concurrent printk() users, existing
printk() will be used for users who failed to assign buffers. Of course,
if there were more than 16 concurrent printk() users, the printk() output
would flood the console and the system would be already unusable. Thus,
16 buffers should be sufficient unless e.g. hundreds of memory allocating
threads concurrently call warn_alloc() for reporting allocation stalls on
a large system.

Three functions (get_printk_buffer(), flush_printk_buffer() and
put_printk_buffer()) are provided for printk() buffering.

  get_printk_buffer() tries to assign a buffer of the calling
  thread/context.

  flush_printk_buffer() flushes the buffer of the calling thread/context.
  flush_printk_buffer() becomes a no-op if the buffer was not assigned.

  put_printk_buffer() flushes and releases the buffer of the calling
  thread/context. put_printk_buffer() becomes a no-op if the buffer was
  not assigned.

These functions are safe to call from NMI context, hard IRQ context,
soft IRQ context, or task context. But these functions become a no-op
if called from NMI context, for vprintk_func() will call vprintk_nmi()
rather than vprintk_default() (where printk() buffering is taken place)
if printk() is called from NMI context.

get_printk_buffer() and put_printk_buffer() can nest as with
rcu_read_lock() and rcu_read_unlock(), and put_printk_buffer() must match
corresponding get_printk_buffer() as with rcu_read_unlock() must match
corresponding rcu_read_lock().

This patch calls these functions from show_state_filter() as an example.
By avoiding small writes using buffers, average of 10 times of

  # time echo t > /proc/sysrq-trigger

showed 25% improvement for default console (no console= argument) case

  buffered:    798.2ms
  unbuffered: 1070.4ms

and 10% improvement for console=ttyS0,115200n8 console=tty0 case

  buffered:    2915.5ms
  unbuffered:  3272.1ms

in my environment (VMware Player, 4CPUs, 2048MB RAM).

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/printk.h |  12 +++
 kernel/printk/printk.c | 243 +++++++++++++++++++++++++++++++++++++++++++------
 kernel/sched/core.c    |   8 +-
 lib/dump_stack.c       |   2 +
 4 files changed, 238 insertions(+), 27 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index e10f274..da35a01 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -170,6 +170,9 @@ int printk_emit(int facility, int level,
 
 asmlinkage __printf(1, 2) __cold
 int printk(const char *fmt, ...);
+void get_printk_buffer(void);
+void flush_printk_buffer(void);
+void put_printk_buffer(void);
 
 /*
  * Special printk facility for scheduler/timekeeping use only, _DO_NOT_USE_ !
@@ -217,6 +220,15 @@ int printk(const char *s, ...)
 {
 	return 0;
 }
+static inline void get_printk_buffer(void)
+{
+}
+static inline void flush_printk_buffer(void)
+{
+}
+static inline void put_printk_buffer(void)
+{
+}
 static inline __printf(1, 2) __cold
 int printk_deferred(const char *s, ...)
 {
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fb2d159..4f5663f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1693,33 +1693,10 @@ static size_t log_output(int facility, int level, enum log_flags lflags, const c
 	return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len);
 }
 
-asmlinkage int vprintk_emit(int facility, int level,
-			    const char *dict, size_t dictlen,
-			    const char *fmt, va_list args)
+static int process_log(int facility, int level, const char *dict,
+		       size_t dictlen, char *text, size_t text_len)
 {
-	static char textbuf[LOG_LINE_MAX];
-	char *text = textbuf;
-	size_t text_len = 0;
 	enum log_flags lflags = 0;
-	unsigned long flags;
-	int printed_len = 0;
-	bool in_sched = false;
-
-	if (level == LOGLEVEL_SCHED) {
-		level = LOGLEVEL_DEFAULT;
-		in_sched = true;
-	}
-
-	boot_delay_msec(level);
-	printk_delay();
-
-	/* This stops the holder of console_sem just where we want him */
-	logbuf_lock_irqsave(flags);
-	/*
-	 * The printf needs to come first; we need the syslog
-	 * prefix which might be passed-in as a parameter.
-	 */
-	text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
 
 	/* mark and strip a trailing newline */
 	if (text_len && text[text_len-1] == '\n') {
@@ -1755,7 +1732,39 @@ asmlinkage int vprintk_emit(int facility, int level,
 	if (dict)
 		lflags |= LOG_PREFIX|LOG_NEWLINE;
 
-	printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);
+	return log_output(facility, level, lflags, dict, dictlen, text,
+			  text_len);
+}
+
+asmlinkage int vprintk_emit(int facility, int level,
+			    const char *dict, size_t dictlen,
+			    const char *fmt, va_list args)
+{
+	static char textbuf[LOG_LINE_MAX];
+	char *text = textbuf;
+	size_t text_len = 0;
+	unsigned long flags;
+	int printed_len = 0;
+	bool in_sched = false;
+
+	if (level == LOGLEVEL_SCHED) {
+		level = LOGLEVEL_DEFAULT;
+		in_sched = true;
+	}
+
+	boot_delay_msec(level);
+	printk_delay();
+
+	/* This stops the holder of console_sem just where we want him */
+	logbuf_lock_irqsave(flags);
+	/*
+	 * The printf needs to come first; we need the syslog
+	 * prefix which might be passed-in as a parameter.
+	 */
+	text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
+
+	printed_len = process_log(facility, level, dict, dictlen, text,
+				  text_len);
 
 	logbuf_unlock_irqrestore(flags);
 
@@ -1795,8 +1804,153 @@ asmlinkage int printk_emit(int facility, int level,
 }
 EXPORT_SYMBOL(printk_emit);
 
+#define MAX_PRINTK_BUFFERS 16
+static struct printk_buffer {
+	unsigned long context; /* printk_context() */
+	unsigned int nested;
+	unsigned int used; /* Valid bytes in buf[]. */
+	char buf[PAGE_SIZE];
+} printk_buffers[MAX_PRINTK_BUFFERS];
+
+/* Must not be called from NMI context. */
+static void __flush_printk_buffer(struct printk_buffer *ptr, bool all)
+{
+	unsigned long flags;
+
+	if (!ptr->used)
+		return;
+
+	/*
+	 * Since printk_deferred() directly calls vprintk_emit(LOGLEVEL_SCHED),
+	 * this function does not need to care about LOGLEVEL_SCHED case.
+	 * Therefore, it is safe to call console_trylock() + console_unlock().
+	 *
+	 * We don't call boot_delay_msec(level) here because level is unknown.
+	 */
+	printk_delay();
+
+	/* This stops the holder of console_sem just where we want him */
+	logbuf_lock_irqsave(flags);
+	while (1) {
+		char *text = ptr->buf;
+		unsigned int text_len = ptr->used;
+		char *cp = memchr(text, '\n', text_len);
+		char c;
+
+		if (cp++)
+			text_len = cp - text;
+		else if (all)
+			cp = text + text_len;
+		else
+			break;
+		/* printk_get_level() depends on text '\0'-terminated. */
+		c = *cp;
+		*cp = '\0';
+		process_log(0, LOGLEVEL_DEFAULT, NULL, 0, text, text_len);
+		ptr->used -= text_len;
+		if (!ptr->used)
+			break;
+		*cp = c;
+		memmove(text, text + text_len, ptr->used);
+	}
+	logbuf_unlock_irqrestore(flags);
+	/*
+	 * Try to acquire and then immediately release the console
+	 * semaphore.  The release will print out buffers and wake up
+	 * /dev/kmsg and syslog() users.
+	 */
+	if (console_trylock())
+		console_unlock();
+}
+
+static unsigned long printk_context(void)
+{
+	/*
+	 * Assume that we can use lower 2 bits for flags, as with
+	 * __mutex_owner() does.
+	 */
+	unsigned long context = (unsigned long) current;
+
+	/* Both bits set means processing NMI context. */
+	if (in_nmi())
+		context |= 3;
+	/* Only next-LSB set means processing hard IRQ context. */
+	else if (in_irq())
+		context |= 2;
+	/* Only LSB set means processing soft IRQ context. */
+	else if (in_serving_softirq())
+		context |= 1;
+	/*
+	 * Neither bits set means processing task context,
+	 * though still might be non sleepable context.
+	 */
+	return context;
+}
+
+static struct printk_buffer *find_printk_buffer(void)
+{
+	const unsigned long context = printk_context();
+	int i;
+
+	/* No-op if called from NMI context. */
+	if ((context & 3) == 3)
+		return NULL;
+	for (i = 0; i < MAX_PRINTK_BUFFERS; i++)
+		if (context == printk_buffers[i].context)
+			return &printk_buffers[i];
+	return NULL;
+}
+
+void get_printk_buffer(void)
+{
+	const unsigned long context = printk_context();
+	int i;
+
+	/* No-op if called from NMI context. */
+	if ((context & 3) == 3)
+		return;
+	for (i = 0; i < MAX_PRINTK_BUFFERS; i++) {
+		struct printk_buffer *ptr = &printk_buffers[i];
+
+		if (ptr->context != context) {
+			if (ptr->context ||
+			    cmpxchg(&ptr->context, 0, context))
+				continue;
+			ptr->nested = 0;
+			ptr->used = 0;
+		} else {
+			ptr->nested++;
+		}
+		break;
+	}
+}
+EXPORT_SYMBOL(get_printk_buffer);
+
+void flush_printk_buffer(void)
+{
+	struct printk_buffer *ptr = find_printk_buffer();
+
+	if (ptr)
+		__flush_printk_buffer(ptr, true);
+}
+EXPORT_SYMBOL(flush_printk_buffer);
+
+void put_printk_buffer(void)
+{
+	struct printk_buffer *ptr = find_printk_buffer();
+
+	if (!ptr || ptr->nested--)
+		return;
+	__flush_printk_buffer(ptr, true);
+	xchg(&ptr->context, 0);
+}
+EXPORT_SYMBOL(put_printk_buffer);
+
 int vprintk_default(const char *fmt, va_list args)
 {
+	struct printk_buffer *ptr;
+	va_list args2;
+	unsigned int i;
 	int r;
 
 #ifdef CONFIG_KGDB_KDB
@@ -1806,6 +1960,43 @@ int vprintk_default(const char *fmt, va_list args)
 		return r;
 	}
 #endif
+	ptr = find_printk_buffer();
+	if (!ptr)
+		goto use_unbuffered;
+	/*
+	 * Try to store to printk_buffer first. If it fails, flush completed
+	 * lines in printk_buffer, and then try to store to printk_buffer
+	 * again. If it still fails, flush incomplete line in printk_buffer
+	 * and use unbuffered printing.
+	 *
+	 * Since printk_buffer is identified by current thread and interrupt
+	 * context and same level of context does not recurse, we don't need
+	 * logbuf_lock_irqsave()/logbuf_unlock_irqrestore() here except
+	 * __flush_printk_buffer().
+	 */
+	for (i = 0; i < 2; i++) {
+		unsigned int pos = ptr->used;
+		char *text = ptr->buf + pos;
+
+		va_copy(args2, args);
+		r = vsnprintf(text, sizeof(ptr->buf) - pos, fmt, args2);
+		va_end(args2);
+		if (r + pos < sizeof(ptr->buf)) {
+			/*
+			 * Eliminate KERN_CONT at this point because we can
+			 * concatenate incomplete lines inside printk_buffer.
+			 */
+			if (r >= 2 && printk_get_level(text) == 'c') {
+				memmove(text, text + 2, r - 2);
+				ptr->used += r - 2;
+			} else {
+				ptr->used += r;
+			}
+			return r;
+		}
+		__flush_printk_buffer(ptr, i);
+	}
+use_unbuffered:
 	r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
 
 	return r;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 39b9a279..54517f6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5305,6 +5305,7 @@ void sched_show_task(struct task_struct *p)
 
 	if (!try_get_task_stack(p))
 		return;
+	get_printk_buffer();
 	if (state)
 		state = __ffs(state) + 1;
 	printk(KERN_INFO "%-15.15s %c", p->comm,
@@ -5326,12 +5327,14 @@ void sched_show_task(struct task_struct *p)
 	print_worker_info(KERN_INFO, p);
 	show_stack(p, NULL);
 	put_task_stack(p);
+	put_printk_buffer();
 }
 
 void show_state_filter(unsigned long state_filter)
 {
 	struct task_struct *g, *p;
 
+	get_printk_buffer();
 #if BITS_PER_LONG == 32
 	printk(KERN_INFO
 		"  task                PC stack   pid father\n");
@@ -5350,8 +5353,10 @@ void show_state_filter(unsigned long state_filter)
 		 */
 		touch_nmi_watchdog();
 		touch_all_softlockup_watchdogs();
-		if (!state_filter || (p->state & state_filter))
+		if (!state_filter || (p->state & state_filter)) {
 			sched_show_task(p);
+			flush_printk_buffer();
+		}
 	}
 
 #ifdef CONFIG_SCHED_DEBUG
@@ -5364,6 +5369,7 @@ void show_state_filter(unsigned long state_filter)
 	 */
 	if (!state_filter)
 		debug_show_all_locks();
+	put_printk_buffer();
 }
 
 void init_idle_bootup_task(struct task_struct *idle)
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 625375e..da0e64c 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -12,8 +12,10 @@
 
 static void __dump_stack(void)
 {
+	get_printk_buffer();
 	dump_stack_print_info(KERN_DEFAULT);
 	show_stack(NULL, NULL);
+	put_printk_buffer();
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-04-30 13:54 [PATCH] printk: Add best-effort printk() buffering Tetsuo Handa
@ 2017-04-30 16:11 ` Joe Perches
  2017-05-03  6:21   ` Tetsuo Handa
  2017-05-08  7:05 ` Sergey Senozhatsky
  1 sibling, 1 reply; 11+ messages in thread
From: Joe Perches @ 2017-04-30 16:11 UTC (permalink / raw)
  To: Tetsuo Handa, linux-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Petr Mladek, Sergey Senozhatsky,
	Steven Rostedt

On Sun, 2017-04-30 at 22:54 +0900, Tetsuo Handa wrote:
> Sometimes we want to printk() multiple lines in a group without being
> disturbed by concurrent printk() from interrupts and/or other threads.
> For example, mixed printk() output of multiple thread's dump makes it
> hard to interpret.
> 
> This patch introduces fixed-sized statically allocated buffers for
> buffering printk() output for each thread/context in best effort
> (i.e. up to PAGE_SIZE bytes, up to 16 concurrent printk() users).
[]
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
[]
> +#define MAX_PRINTK_BUFFERS 16
> +static struct printk_buffer {
> +	unsigned long context; /* printk_context() */
> +	unsigned int nested;
> +	unsigned int used; /* Valid bytes in buf[]. */
> +	char buf[PAGE_SIZE];
> +} printk_buffers[MAX_PRINTK_BUFFERS];

Perhaps these buffers could be acquired by
alloc_page rather than be static structures and
the sizeof buf[PAGE_SIZE] should be reduced by
	sizeof(unsigned long) +
	sizeof(unsigned int) +
	sizeof(unsigned int)
so that struct printk_buffers is exactly
PAGE_SIZE.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-04-30 16:11 ` Joe Perches
@ 2017-05-03  6:21   ` Tetsuo Handa
  2017-05-03  9:30     ` Joe Perches
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2017-05-03  6:21 UTC (permalink / raw)
  To: joe, linux-kernel; +Cc: mingo, peterz, pmladek, sergey.senozhatsky, rostedt

Joe Perches wrote:
> On Sun, 2017-04-30 at 22:54 +0900, Tetsuo Handa wrote:
> > Sometimes we want to printk() multiple lines in a group without being
> > disturbed by concurrent printk() from interrupts and/or other threads.
> > For example, mixed printk() output of multiple thread's dump makes it
> > hard to interpret.
> > 
> > This patch introduces fixed-sized statically allocated buffers for
> > buffering printk() output for each thread/context in best effort
> > (i.e. up to PAGE_SIZE bytes, up to 16 concurrent printk() users).
> []
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> []
> > +#define MAX_PRINTK_BUFFERS 16
> > +static struct printk_buffer {
> > +	unsigned long context; /* printk_context() */
> > +	unsigned int nested;
> > +	unsigned int used; /* Valid bytes in buf[]. */
> > +	char buf[PAGE_SIZE];
> > +} printk_buffers[MAX_PRINTK_BUFFERS];
> 
> Perhaps these buffers could be acquired by
> alloc_page rather than be static structures and
> the sizeof buf[PAGE_SIZE] should be reduced by
> 	sizeof(unsigned long) +
> 	sizeof(unsigned int) +
> 	sizeof(unsigned int)
> so that struct printk_buffers is exactly
> PAGE_SIZE.

When should the buffers be allocated? If upon boot, there will be little
difference. If the first time each buffer is needed, we introduce a risk
of failing to allocate memory using alloc_page(GFP_ATOMIC | __GFP_NOWARN)
and a risk of stack overflow during alloc_page() because printk() has to be
prepared for being called from stack-tight situation.

Also, while dynamic allocation can allow linked list of the buffer, we
will need to introduce a lock for traversing the list, which might become
more expensive than walking fixed-sized array of the buffer.
We could avoid list traversal by passing "struct printk_buffer" argument,
but since there are so many functions which expect pr_cont() behavior,
scattering "struct printk_buffer" argument is a big patch.
We could avoid list traversal by storing "struct printk_buffer[3]" inside
"struct task_struct" (for hard IRQ context, soft IRQ context, task context),
but occupying sizeof(struct printk_buffer *) * 3 bytes only for buffers
which will be used only for a moment is not smart.

Thus, I think fixed-sized statically allocated buffers is the most
reasonable choice. Using a CONFIG_ option for controlling how many pages
should be allocated for "struct printk_buffer" might make sense for systems
with little RAM.

Tetsuo Handa wrote:
> +void get_printk_buffer(void)
> +{
> +	const unsigned long context = printk_context();
> +	int i;
> +
> +	/* No-op if called from NMI context. */
> +	if ((context & 3) == 3)
> +		return;
> +	for (i = 0; i < MAX_PRINTK_BUFFERS; i++) {
> +		struct printk_buffer *ptr = &printk_buffers[i];
> +
> +		if (ptr->context != context) {
> +			if (ptr->context ||
> +			    cmpxchg(&ptr->context, 0, context))
> +				continue;
> +			ptr->nested = 0;
> +			ptr->used = 0;
> +		} else {
> +			ptr->nested++;
> +		}
> +		break;
> +	}
> +}

Oops, I over-optimized here. We should not reserve ptr->context == 0 slot
before checking ptr->context != 0 slots, or we may race with nested calls.

void get_printk_buffer(void)
{
	const unsigned long context = printk_context();
	struct printk_buffer *ptr;
	int i;

	/* No-op if called from NMI context. */
	if ((context & 3) == 3)
		return;
	ptr = find_printk_buffer();
	if (ptr) {
		ptr->nested++;
		return;
	}
	for (i = 0; i < MAX_PRINTK_BUFFERS; i++) {
		ptr = &printk_buffers[i];
		if (ptr->context || cmpxchg(&ptr->context, 0, context))
			continue;
		ptr->nested = 0;
		ptr->used = 0;
		break;
	}
}



>  int vprintk_default(const char *fmt, va_list args)
>  {
> +	struct printk_buffer *ptr;
> +	va_list args2;
> +	unsigned int i;
>  	int r;
>  
>  #ifdef CONFIG_KGDB_KDB
> @@ -1806,6 +1960,43 @@ int vprintk_default(const char *fmt, va_list args)
>  		return r;
>  	}
>  #endif
> +	ptr = find_printk_buffer();
> +	if (!ptr)
> +		goto use_unbuffered;
> +	/*
> +	 * Try to store to printk_buffer first. If it fails, flush completed
> +	 * lines in printk_buffer, and then try to store to printk_buffer
> +	 * again. If it still fails, flush incomplete line in printk_buffer
> +	 * and use unbuffered printing.
> +	 *
> +	 * Since printk_buffer is identified by current thread and interrupt
> +	 * context and same level of context does not recurse, we don't need
> +	 * logbuf_lock_irqsave()/logbuf_unlock_irqrestore() here except
> +	 * __flush_printk_buffer().
> +	 */

Before I continue, I'd like to confirm that this assumption is correct.
My understanding is that

  (1) task context can be preempted by soft IRQ context, hard IRQ context and NMI context.
  (2) soft IRQ context can be preempted by hard IRQ context and NMI context.
  (3) hard IRQ context can be preempted by NMI context.
  (4) An kernel-oops event can interrupt task context, soft IRQ context, hard IRQ context
      and NMI context, but the interrupted context can not continue execution of
      vprintk_default() after returning from the kernel-oops event even if the
      kernel-oops event occurred in schedulable context and panic_on_oops == 0.

and as a result, it is guaranteed that an kernel-oops event which occurred between

> +	for (i = 0; i < 2; i++) {

                here

> +		unsigned int pos = ptr->used;
> +		char *text = ptr->buf + pos;
> +
> +		va_copy(args2, args);
> +		r = vsnprintf(text, sizeof(ptr->buf) - pos, fmt, args2);
> +		va_end(args2);
> +		if (r + pos < sizeof(ptr->buf)) {
> +			/*
> +			 * Eliminate KERN_CONT at this point because we can
> +			 * concatenate incomplete lines inside printk_buffer.
> +			 */
> +			if (r >= 2 && printk_get_level(text) == 'c') {
> +				memmove(text, text + 2, r - 2);
> +				ptr->used += r - 2;
> +			} else {
> +				ptr->used += r;
> +			}
> +			return r;
> +		}

                and here
shall not make ptr->used >= sizeof(ptr->buf) true.

> +		__flush_printk_buffer(ptr, i);
> +	}
> +use_unbuffered:
>  	r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
>  
>  	return r;

Well, should vprintk_default() refrain from buffering if oops_in_progress == 1 ?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-05-03  6:21   ` Tetsuo Handa
@ 2017-05-03  9:30     ` Joe Perches
  0 siblings, 0 replies; 11+ messages in thread
From: Joe Perches @ 2017-05-03  9:30 UTC (permalink / raw)
  To: Tetsuo Handa, linux-kernel
  Cc: mingo, peterz, pmladek, sergey.senozhatsky, rostedt

On Wed, 2017-05-03 at 15:21 +0900, Tetsuo Handa wrote:
> Joe Perches wrote:
> > On Sun, 2017-04-30 at 22:54 +0900, Tetsuo Handa wrote:
> > > Sometimes we want to printk() multiple lines in a group without being
> > > disturbed by concurrent printk() from interrupts and/or other threads.
> > > For example, mixed printk() output of multiple thread's dump makes it
> > > hard to interpret.
> > > 
> > > This patch introduces fixed-sized statically allocated buffers for
> > > buffering printk() output for each thread/context in best effort
> > > (i.e. up to PAGE_SIZE bytes, up to 16 concurrent printk() users).
> > 
> > []
> > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > 
> > []
> > > +#define MAX_PRINTK_BUFFERS 16
> > > +static struct printk_buffer {
> > > +	unsigned long context; /* printk_context() */
> > > +	unsigned int nested;
> > > +	unsigned int used; /* Valid bytes in buf[]. */
> > > +	char buf[PAGE_SIZE];
> > > +} printk_buffers[MAX_PRINTK_BUFFERS];
> > 
> > Perhaps these buffers could be acquired by
> > alloc_page rather than be static structures and
> > the sizeof buf[PAGE_SIZE] should be reduced by
> > 	sizeof(unsigned long) +
> > 	sizeof(unsigned int) +
> > 	sizeof(unsigned int)
> > so that struct printk_buffers is exactly
> > PAGE_SIZE.
> 
> When should the buffers be allocated? If upon boot, there will be little
> difference. If the first time each buffer is needed, we introduce a risk
> of failing to allocate memory using alloc_page(GFP_ATOMIC | __GFP_NOWARN)
> and a risk of stack overflow during alloc_page() because printk() has to be
> prepared for being called from stack-tight situation.

This is supposed to be best effort anyway.

> Also, while dynamic allocation can allow linked list of the buffer, we
> will need to introduce a lock for traversing the list, which might become
> more expensive than walking fixed-sized array of the buffer.

Shouldn't matter as this is supposed to be best effort
and any printk is already quite expensive and not fast-path.

> We could avoid list traversal by passing "struct printk_buffer" argument,
> but since there are so many functions which expect pr_cont() behavior,
> scattering "struct printk_buffer" argument is a big patch.

There would be possible to add pid instead.

> Thus, I think fixed-sized statically allocated buffers is the most
> reasonable choice. Using a CONFIG_ option for controlling how many pages
> should be allocated for "struct printk_buffer" might make sense for systems
> with little RAM.

Simpler, but not better.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-04-30 13:54 [PATCH] printk: Add best-effort printk() buffering Tetsuo Handa
  2017-04-30 16:11 ` Joe Perches
@ 2017-05-08  7:05 ` Sergey Senozhatsky
  2017-05-08 13:05   ` Tetsuo Handa
  1 sibling, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2017-05-08  7:05 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Petr Mladek,
	Sergey Senozhatsky, Steven Rostedt

Hello,

sorry for the delay.

On (04/30/17 22:54), Tetsuo Handa wrote:
> Sometimes we want to printk() multiple lines in a group without being
> disturbed by concurrent printk() from interrupts and/or other threads.
> For example, mixed printk() output of multiple thread's dump makes it
> hard to interpret.

hm, it's very close to what printk-safe does [and printk-nmi, of course].
the difference is that buffered-printk does not disable local IRQs,
unlike printk-safe, which has to do it by design. so the question is,
can buffered-printk impose atomicity requirements? it seems that it can
(am I wrong?). and, if so, then can we use printk-safe instead? we can
add a new printk_buffered_begin/printk_buffered_end API, for example,
(or enter/exit) for that purpose, that would set a buffered-printk
`printk_context' bit so we can flush buffers in a "special way", not via IRQ
work, and may be avoid message loss (printk-safe buffers are bigger in size
than proposed PAGE_SIZE buffers).


> This patch introduces fixed-sized statically allocated buffers for
> buffering printk() output for each thread/context in best effort
> (i.e. up to PAGE_SIZE bytes, up to 16 concurrent printk() users).

hm, 16 is rather random, it's too much for UP and probably not enough for
a 240 CPUs system. for the time being there are 3 buffered-printk users
(as far as I can see), but who knows how more will be added in the future.
each CPU can have overlapping printks from process, IRQ and NMI contexts.
for NMI we use printk-nmi buffers, so it's out of the list; but, in general,
*it seems* that we better depend on the number of CPUs the system has.
which, once again, returns us back to printk-safe...

thoughts?


[..]

> +/* Must not be called from NMI context. */
> +static void __flush_printk_buffer(struct printk_buffer *ptr, bool all)
> +{
> +	unsigned long flags;
> +
> +	if (!ptr->used)
> +		return;
> +
> +	/*
> +	 * Since printk_deferred() directly calls vprintk_emit(LOGLEVEL_SCHED),
> +	 * this function does not need to care about LOGLEVEL_SCHED case.
> +	 * Therefore, it is safe to call console_trylock() + console_unlock().
> +	 *
> +	 * We don't call boot_delay_msec(level) here because level is unknown.
> +	 */
> +	printk_delay();
> +
> +	/* This stops the holder of console_sem just where we want him */
> +	logbuf_lock_irqsave(flags);
> +	while (1) {
> +		char *text = ptr->buf;
> +		unsigned int text_len = ptr->used;
> +		char *cp = memchr(text, '\n', text_len);
> +		char c;

what guarantees that there'll always be a terminating newline?

	-ss

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-05-08  7:05 ` Sergey Senozhatsky
@ 2017-05-08 13:05   ` Tetsuo Handa
  2017-05-09  1:04     ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2017-05-08 13:05 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: linux-kernel, mingo, peterz, pmladek, sergey.senozhatsky, rostedt

Sergey Senozhatsky wrote:
> sorry for the delay.

No problem.

> 
> On (04/30/17 22:54), Tetsuo Handa wrote:
> > Sometimes we want to printk() multiple lines in a group without being
> > disturbed by concurrent printk() from interrupts and/or other threads.
> > For example, mixed printk() output of multiple thread's dump makes it
> > hard to interpret.
> 
> hm, it's very close to what printk-safe does [and printk-nmi, of course].
> the difference is that buffered-printk does not disable local IRQs,
> unlike printk-safe, which has to do it by design. so the question is,
> can buffered-printk impose atomicity requirements? it seems that it can
> (am I wrong?). and, if so, then can we use printk-safe instead? we can
> add a new printk_buffered_begin/printk_buffered_end API, for example,
> (or enter/exit) for that purpose, that would set a buffered-printk
> `printk_context' bit so we can flush buffers in a "special way", not via IRQ
> work, and may be avoid message loss (printk-safe buffers are bigger in size
> than proposed PAGE_SIZE buffers).

printk_buffered_begin()/printk_buffered_end() corresponds to
get_printk_buffer()/put_printk_buffer().
printk_context() distinguishes atomic contexts.
flush_printk_buffer() flushes from non-NMI context.

What does atomicity requirements mean?

> 
> 
> > This patch introduces fixed-sized statically allocated buffers for
> > buffering printk() output for each thread/context in best effort
> > (i.e. up to PAGE_SIZE bytes, up to 16 concurrent printk() users).
> 
> hm, 16 is rather random, it's too much for UP and probably not enough for
> a 240 CPUs system. for the time being there are 3 buffered-printk users
> (as far as I can see), but who knows how more will be added in the future.
> each CPU can have overlapping printks from process, IRQ and NMI contexts.
> for NMI we use printk-nmi buffers, so it's out of the list; but, in general,
> *it seems* that we better depend on the number of CPUs the system has.
> which, once again, returns us back to printk-safe...
> 
> thoughts?

I can make 16 a CONFIG_ option. Would you read 201705031521.EIJ39594.MFtOVOHSFLFOJQ@I-love.SAKURA.ne.jp ?
But as long as actually writing to console devices is slow, message loss
is inevitable no matter how big buffer is used. Rather, I'd expect an API
which allows printk() users in schedulable context (e.g. kmallocwd and/or
warn_alloc() for reporting allocation stalls) to wait until written to
console devices. That will more likely to reduce message loss.

> > +	while (1) {
> > +		char *text = ptr->buf;
> > +		unsigned int text_len = ptr->used;
> > +		char *cp = memchr(text, '\n', text_len);
> > +		char c;
> 
> what guarantees that there'll always be a terminating newline?

Nothing guarantees. Why need such a guarantee?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-05-08 13:05   ` Tetsuo Handa
@ 2017-05-09  1:04     ` Sergey Senozhatsky
  2017-05-09 11:41       ` Tetsuo Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2017-05-09  1:04 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, linux-kernel, mingo, peterz, pmladek,
	sergey.senozhatsky, rostedt

Hello,

On (05/08/17 22:05), Tetsuo Handa wrote:
> > On (04/30/17 22:54), Tetsuo Handa wrote:
> > > Sometimes we want to printk() multiple lines in a group without being
> > > disturbed by concurrent printk() from interrupts and/or other threads.
> > > For example, mixed printk() output of multiple thread's dump makes it
> > > hard to interpret.
> > 
> > hm, it's very close to what printk-safe does [and printk-nmi, of course].
> > the difference is that buffered-printk does not disable local IRQs,
> > unlike printk-safe, which has to do it by design. so the question is,
> > can buffered-printk impose atomicity requirements? it seems that it can
> > (am I wrong?). and, if so, then can we use printk-safe instead? we can
> > add a new printk_buffered_begin/printk_buffered_end API, for example,
> > (or enter/exit) for that purpose, that would set a buffered-printk
> > `printk_context' bit so we can flush buffers in a "special way", not via IRQ
> > work, and may be avoid message loss (printk-safe buffers are bigger in size
> > than proposed PAGE_SIZE buffers).
> 
> printk_buffered_begin()/printk_buffered_end() corresponds to
> get_printk_buffer()/put_printk_buffer().
> printk_context() distinguishes atomic contexts.
> flush_printk_buffer() flushes from non-NMI context.
> 
> What does atomicity requirements mean?

what I meant was -- "can we sleep under printk_buffered_begin() or not".
printk-safe disables local IRQs. so what I propose is something like this

	printk-safe-enter    //disable local IRQs, use per-CPU buffer
	backtrace
	printk-safe-exit     //flush per-CPU buffer, enable local IRQs

except that 'printk-safe-enter/exit' will have new names here, say
printk-buffered-begin/end, and, probably, handle flush differently.


> > hm, 16 is rather random, it's too much for UP and probably not enough for
> > a 240 CPUs system. for the time being there are 3 buffered-printk users
> > (as far as I can see), but who knows how more will be added in the future.
> > each CPU can have overlapping printks from process, IRQ and NMI contexts.
> > for NMI we use printk-nmi buffers, so it's out of the list; but, in general,
> > *it seems* that we better depend on the number of CPUs the system has.
> > which, once again, returns us back to printk-safe...
> > 
> > thoughts?
> 
> I can make 16 a CONFIG_ option.

but still, why use additional N buffers, when we already have per-CPU
buffers? what am I missing?

> Would you read 201705031521.EIJ39594.MFtOVOHSFLFOJQ@I-love.SAKURA.ne.jp ?

sure.

> But as long as actually writing to console devices is slow, message loss
> is inevitable no matter how big buffer is used. Rather, I'd expect an API
> which allows printk() users in schedulable context (e.g. kmallocwd and/or
> warn_alloc() for reporting allocation stalls) to wait until written to
> console devices. That will more likely to reduce message loss.

hm, from a schedulable context you can do *something* like

	console_lock()
	printk()
	...
	printk()
	console_unlock()


you won't be able to console_lock() until all pending messages are
flushed. since you are in a schedulable context, you can sleep on
console_sem in console_lock(). well, just saying.


> > > +	while (1) {
> > > +		char *text = ptr->buf;
> > > +		unsigned int text_len = ptr->used;
> > > +		char *cp = memchr(text, '\n', text_len);
> > > +		char c;
> > 
> > what guarantees that there'll always be a terminating newline?
> 
> Nothing guarantees. Why need such a guarantee?

 : The memchr() and memrchr() functions return a pointer to the matching
 : byte or NULL if the character does not occur in the given memory area.


so `cp' can be NULL here?

+               if (cp++)
+                       text_len = cp - text;
+               else if (all)
+                       cp = text + text_len;
+               else
+                       break;
+               /* printk_get_level() depends on text '\0'-terminated. */
+               c = *cp;
+               *cp = '\0';
+               process_log(0, LOGLEVEL_DEFAULT, NULL, 0, text, text_len);
+               ptr->used -= text_len;
+               if (!ptr->used)
+                       break;
+               *cp = c;

	-ss

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-05-09  1:04     ` Sergey Senozhatsky
@ 2017-05-09 11:41       ` Tetsuo Handa
  2017-05-10  6:21         ` Sergey Senozhatsky
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2017-05-09 11:41 UTC (permalink / raw)
  To: sergey.senozhatsky
  Cc: sergey.senozhatsky.work, linux-kernel, mingo, peterz, pmladek, rostedt

Sergey Senozhatsky wrote:
> Hello,
> 
> On (05/08/17 22:05), Tetsuo Handa wrote:
> > > On (04/30/17 22:54), Tetsuo Handa wrote:
> > > > Sometimes we want to printk() multiple lines in a group without being
> > > > disturbed by concurrent printk() from interrupts and/or other threads.
> > > > For example, mixed printk() output of multiple thread's dump makes it
> > > > hard to interpret.
> > > 
> > > hm, it's very close to what printk-safe does [and printk-nmi, of course].
> > > the difference is that buffered-printk does not disable local IRQs,
> > > unlike printk-safe, which has to do it by design. so the question is,
> > > can buffered-printk impose atomicity requirements? it seems that it can
> > > (am I wrong?). and, if so, then can we use printk-safe instead? we can
> > > add a new printk_buffered_begin/printk_buffered_end API, for example,
> > > (or enter/exit) for that purpose, that would set a buffered-printk
> > > `printk_context' bit so we can flush buffers in a "special way", not via IRQ
> > > work, and may be avoid message loss (printk-safe buffers are bigger in size
> > > than proposed PAGE_SIZE buffers).
> > 
> > printk_buffered_begin()/printk_buffered_end() corresponds to
> > get_printk_buffer()/put_printk_buffer().
> > printk_context() distinguishes atomic contexts.
> > flush_printk_buffer() flushes from non-NMI context.
> > 
> > What does atomicity requirements mean?
> 
> what I meant was -- "can we sleep under printk_buffered_begin() or not".
> printk-safe disables local IRQs. so what I propose is something like this
> 
> 	printk-safe-enter    //disable local IRQs, use per-CPU buffer
> 	backtrace
> 	printk-safe-exit     //flush per-CPU buffer, enable local IRQs
> 
> except that 'printk-safe-enter/exit' will have new names here, say
> printk-buffered-begin/end, and, probably, handle flush differently.

OK. Then, answer is that we are allowed to sleep after get_printk_buffer()
if get_printk_buffer() is called from schedulable context because different
printk_buffer will be assigned by get_printk_buffer() if get_printk_buffer()
is called from non-schedulable context.

> 
> 
> > > hm, 16 is rather random, it's too much for UP and probably not enough for
> > > a 240 CPUs system. for the time being there are 3 buffered-printk users
> > > (as far as I can see), but who knows how more will be added in the future.
> > > each CPU can have overlapping printks from process, IRQ and NMI contexts.
> > > for NMI we use printk-nmi buffers, so it's out of the list; but, in general,
> > > *it seems* that we better depend on the number of CPUs the system has.
> > > which, once again, returns us back to printk-safe...
> > > 
> > > thoughts?
> > 
> > I can make 16 a CONFIG_ option.
> 
> but still, why use additional N buffers, when we already have per-CPU
> buffers? what am I missing?

Per-CPU buffers need to disable preemption by disabling local hard
IRQ / soft IRQ. But printk_buffers need not to disable preemption.

> 
> > Would you read 201705031521.EIJ39594.MFtOVOHSFLFOJQ@I-love.SAKURA.ne.jp ?
> 
> sure.
> 
> > But as long as actually writing to console devices is slow, message loss
> > is inevitable no matter how big buffer is used. Rather, I'd expect an API
> > which allows printk() users in schedulable context (e.g. kmallocwd and/or
> > warn_alloc() for reporting allocation stalls) to wait until written to
> > console devices. That will more likely to reduce message loss.
> 
> hm, from a schedulable context you can do *something* like
> 
> 	console_lock()
> 	printk()
> 	...
> 	printk()
> 	console_unlock()
> 
> 
> you won't be able to console_lock() until all pending messages are
> flushed. since you are in a schedulable context, you can sleep on
> console_sem in console_lock(). well, just saying.

console_lock()/console_unlock() pair is different from what I want.

console_lock()/console_unlock() pair blocks as long as somebody else
is printk()ing. What I want is an API for

  current thread waits for N bytes to be written to console devices
  if current thread stored N bytes using printk(), but allow using some
  timeout and killable because waiting unconditionally forever is not good
  (e.g. current thread is expected to bail out soon if OOM-killed during
  waiting for N bytes to be written to console devices)

so that kmallocwd-like thread can wait for necessary and sufficient period
for not overflowing the printk buffer. Even after we offload to a dedicated
kernel thread (e.g. "[RFC][PATCHv3 0/5] printk: introduce printing kernel
thread"), this API is needed for not to overflow the printk buffer

"Wait for N bytes to be written to console devices" could be replaced by
"wait until usage of printk buffer becomes less than XX%".

> 
> 
> > > > +	while (1) {
> > > > +		char *text = ptr->buf;
> > > > +		unsigned int text_len = ptr->used;
> > > > +		char *cp = memchr(text, '\n', text_len);
> > > > +		char c;
> > > 
> > > what guarantees that there'll always be a terminating newline?
> > 
> > Nothing guarantees. Why need such a guarantee?
> 
>  : The memchr() and memrchr() functions return a pointer to the matching
>  : byte or NULL if the character does not occur in the given memory area.
> 
> 
> so `cp' can be NULL here?
> 

Yes.

> +               if (cp++)
> +                       text_len = cp - text;
> +               else if (all)
> +                       cp = text + text_len;
> +               else
> +                       break;
> +               /* printk_get_level() depends on text '\0'-terminated. */

But `cp' cannot be NULL here.

> +               c = *cp;
> +               *cp = '\0';
> +               process_log(0, LOGLEVEL_DEFAULT, NULL, 0, text, text_len);
> +               ptr->used -= text_len;
> +               if (!ptr->used)
> +                       break;
> +               *cp = c;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-05-09 11:41       ` Tetsuo Handa
@ 2017-05-10  6:21         ` Sergey Senozhatsky
  2017-05-10 11:27           ` Tetsuo Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2017-05-10  6:21 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky, sergey.senozhatsky.work, linux-kernel, mingo,
	peterz, pmladek, rostedt

On (05/09/17 20:41), Tetsuo Handa wrote:
[..]
> > what I meant was -- "can we sleep under printk_buffered_begin() or not".
> > printk-safe disables local IRQs. so what I propose is something like this
> > 
> > 	printk-safe-enter    //disable local IRQs, use per-CPU buffer
> > 	backtrace
> > 	printk-safe-exit     //flush per-CPU buffer, enable local IRQs
> > 
> > except that 'printk-safe-enter/exit' will have new names here, say
> > printk-buffered-begin/end, and, probably, handle flush differently.
> 
> OK. Then, answer is that we are allowed to sleep after get_printk_buffer()
> if get_printk_buffer() is called from schedulable context because different
> printk_buffer will be assigned by get_printk_buffer() if get_printk_buffer()
> is called from non-schedulable context.
> 
> > 
> > 
> > > > hm, 16 is rather random, it's too much for UP and probably not enough for
> > > > a 240 CPUs system. for the time being there are 3 buffered-printk users
> > > > (as far as I can see), but who knows how more will be added in the future.
> > > > each CPU can have overlapping printks from process, IRQ and NMI contexts.
> > > > for NMI we use printk-nmi buffers, so it's out of the list; but, in general,
> > > > *it seems* that we better depend on the number of CPUs the system has.
> > > > which, once again, returns us back to printk-safe...
> > > > 
> > > > thoughts?
> > > 
> > > I can make 16 a CONFIG_ option.
> > 
> > but still, why use additional N buffers, when we already have per-CPU
> > buffers? what am I missing?
> 
> Per-CPU buffers need to disable preemption by disabling local hard
> IRQ / soft IRQ. But printk_buffers need not to disable preemption.

yes. ok. seems that I can't explain what I want.

my point is:

printk-buffered does not disable preemption and we can sleep under
printk-buffered-begin. fine. but why would you want to sleep there anyway?
you just want to print a backtrace and be done with it. and backtracing
does not sleep, afaiu, or it least it should not, because it must be
possible to dump_stack() from atomic context. so why have

	printk-buffered keeps preemption and irqs enable and uses one
	of aux buffers (if any).

instead of

	printk-buffered starts an atomic section - it disables preemption
	and local irqs, because it uses per-CPU buffer (which is always,
	and already, there).

?

[..]
> > hm, from a schedulable context you can do *something* like
> > 
> > 	console_lock()
> > 	printk()
> > 	...
> > 	printk()
> > 	console_unlock()
> > 
> > 
> > you won't be able to console_lock() until all pending messages are
> > flushed. since you are in a schedulable context, you can sleep on
> > console_sem in console_lock(). well, just saying.
> 
> console_lock()/console_unlock() pair is different from what I want.
> 
> console_lock()/console_unlock() pair blocks as long as somebody else
> is printk()ing. What I want is an API for
> 
>   current thread waits for N bytes to be written to console devices
>   if current thread stored N bytes using printk(), but allow using some
>   timeout and killable because waiting unconditionally forever is not good
>   (e.g. current thread is expected to bail out soon if OOM-killed during
>   waiting for N bytes to be written to console devices)

I assume you are talking here about a completely new API, not related to
the patch in question (because your patch does not do this). right?

	-ss

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-05-10  6:21         ` Sergey Senozhatsky
@ 2017-05-10 11:27           ` Tetsuo Handa
  2017-05-15 13:15             ` Petr Mladek
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2017-05-10 11:27 UTC (permalink / raw)
  To: sergey.senozhatsky.work
  Cc: sergey.senozhatsky, linux-kernel, mingo, peterz, pmladek, rostedt

Sergey Senozhatsky wrote:
> On (05/09/17 20:41), Tetsuo Handa wrote:
> [..]
> > > what I meant was -- "can we sleep under printk_buffered_begin() or not".
> > > printk-safe disables local IRQs. so what I propose is something like this
> > > 
> > > 	printk-safe-enter    //disable local IRQs, use per-CPU buffer
> > > 	backtrace
> > > 	printk-safe-exit     //flush per-CPU buffer, enable local IRQs
> > > 
> > > except that 'printk-safe-enter/exit' will have new names here, say
> > > printk-buffered-begin/end, and, probably, handle flush differently.
> > 
> > OK. Then, answer is that we are allowed to sleep after get_printk_buffer()
> > if get_printk_buffer() is called from schedulable context because different
> > printk_buffer will be assigned by get_printk_buffer() if get_printk_buffer()
> > is called from non-schedulable context.
> > 
> > > 
> > > 
> > > > > hm, 16 is rather random, it's too much for UP and probably not enough for
> > > > > a 240 CPUs system. for the time being there are 3 buffered-printk users
> > > > > (as far as I can see), but who knows how more will be added in the future.
> > > > > each CPU can have overlapping printks from process, IRQ and NMI contexts.
> > > > > for NMI we use printk-nmi buffers, so it's out of the list; but, in general,
> > > > > *it seems* that we better depend on the number of CPUs the system has.
> > > > > which, once again, returns us back to printk-safe...
> > > > > 
> > > > > thoughts?
> > > > 
> > > > I can make 16 a CONFIG_ option.
> > > 
> > > but still, why use additional N buffers, when we already have per-CPU
> > > buffers? what am I missing?
> > 
> > Per-CPU buffers need to disable preemption by disabling local hard
> > IRQ / soft IRQ. But printk_buffers need not to disable preemption.
> 
> yes. ok. seems that I can't explain what I want.
> 
> my point is:
> 
> printk-buffered does not disable preemption and we can sleep under
> printk-buffered-begin. fine. but why would you want to sleep there anyway?
> you just want to print a backtrace and be done with it. and backtracing
> does not sleep, afaiu, or it least it should not, because it must be
> possible to dump_stack() from atomic context. so why have
> 
> 	printk-buffered keeps preemption and irqs enable and uses one
> 	of aux buffers (if any).
> 
> instead of
> 
> 	printk-buffered starts an atomic section - it disables preemption
> 	and local irqs, because it uses per-CPU buffer (which is always,
> 	and already, there).
> 
> ?

Backtrace is just an example chunk where multiple lines are expected to be
printed together. In practice, backtraces are surrounded by preceding lines
and trailing lines. For example, an OOM killer message looks like

[   88.584836] a.out invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
[   88.590706] a.out cpuset=/ mems_allowed=0
[   88.594524] CPU: 1 PID: 2981 Comm: a.out Not tainted 4.11.0-rc7-next-20170421+ #588
[   88.599445] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[   88.604543] Call Trace:
[   88.606779]  ? dump_stack+0x5c/0x7d
[   88.608655]  ? dump_header+0x97/0x233
[   88.610083]  ? oom_kill_process+0x223/0x3e0
[   88.611511]  ? has_capability_noaudit+0x17/0x20
[   88.613046]  ? oom_badness+0xeb/0x160
[   88.614359]  ? out_of_memory+0x10b/0x490
[   88.615592]  ? __alloc_pages_slowpath+0x701/0x8e2
[   88.616947]  ? __alloc_pages_nodemask+0x1ed/0x210
[   88.618267]  ? alloc_pages_vma+0x9f/0x220
[   88.619475]  ? __handle_mm_fault+0xc22/0x11e0
[   88.620793]  ? handle_mm_fault+0xc5/0x220
[   88.622222]  ? __do_page_fault+0x21e/0x4b0
[   88.623903]  ? do_page_fault+0x2b/0x70
[   88.625235]  ? page_fault+0x28/0x30
[   88.626392] Mem-Info:
[   88.627314] active_anon:395525 inactive_anon:2096 isolated_anon:0
[   88.627314]  active_file:60 inactive_file:56 isolated_file:0
[   88.627314]  unevictable:0 dirty:7 writeback:0 unstable:0
[   88.627314]  slab_reclaimable:2211 slab_unreclaimable:6459
[   88.627314]  mapped:411 shmem:2162 pagetables:2418 bounce:0
[   88.627314]  free:13040 free_pcp:54 free_cma:0
[   88.637107] Node 0 active_anon:1582100kB inactive_anon:8384kB active_file:240kB inactive_file:224kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1644kB dirty:28kB writeback:0kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1349632kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   88.644027] Node 0 DMA free:7176kB min:408kB low:508kB high:608kB active_anon:8696kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:40kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   88.651384] lowmem_reserve[]: 0 1696 1696 1696
[   88.653015] Node 0 DMA32 free:44984kB min:44644kB low:55804kB high:66964kB active_anon:1573396kB inactive_anon:8384kB active_file:1268kB inactive_file:968kB unevictable:0kB writepending:28kB present:2080640kB managed:1756232kB mlocked:0kB slab_reclaimable:8844kB slab_unreclaimable:25804kB kernel_stack:5856kB pagetables:9632kB bounce:0kB free_pcp:248kB local_pcp:40kB free_cma:0kB
[   88.661952] lowmem_reserve[]: 0 0 0 0
[   88.663388] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (M) 2*1024kB (UM) 0*2048kB 1*4096kB (M) = 7140kB
[   88.666629] Node 0 DMA32: 1041*4kB (UME) 616*8kB (UE) 157*16kB (UE) 95*32kB (UE) 59*64kB (UME) 61*128kB (ME) 27*256kB (UME) 15*512kB (ME) 4*1024kB (UME) 0*2048kB 0*4096kB = 44916kB
[   88.671430] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[   88.674182] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[   88.676448] 2299 total pagecache pages
[   88.678121] 0 pages in swap cache
[   88.679472] Swap cache stats: add 0, delete 0, find 0/0
[   88.681159] Free swap  = 0kB
[   88.682698] Total swap = 0kB
[   88.683996] 524157 pages RAM
[   88.685385] 0 pages HighMem/MovableOnly
[   88.686970] 81123 pages reserved
[   88.688221] 0 pages cma reserved
[   88.689831] 0 pages hwpoisoned
[   88.691238] Out of memory: Kill process 2982 (a.out) score 999 or sacrifice child
[   88.693403] Killed process 2982 (a.out) total-vm:4172kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB

where showing memory related information involves full of pr_cont() usage.

If sysctl_oom_dump_tasks is set to 1, hundreds to a few thousands of lines
could be added by dump_tasks() and per-CPU buffers might not be large enough
(e.g. CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 is only 8KB, isn't it?).

Surrounding oom_kill_process() with printk_safe_enter()/printk_safe_exit() is not
enough, and there is no need to disable hard IRQ during oom_kill_process().

I don't want to worry about current thread migrating to different CPUs during
oom_kill_process() and messages stored during printk_safe_enter()/printk_safe_exit() are
written to console devices out of order. Can the printk kernel thread guarantee ordering?

Yes, if an API for waiting for per-CPU printk buffers be written to console devices
exists, we could scatter around that API so that per-CPU buffers will not overflow.

> 
> [..]
> > > hm, from a schedulable context you can do *something* like
> > > 
> > > 	console_lock()
> > > 	printk()
> > > 	...
> > > 	printk()
> > > 	console_unlock()
> > > 
> > > 
> > > you won't be able to console_lock() until all pending messages are
> > > flushed. since you are in a schedulable context, you can sleep on
> > > console_sem in console_lock(). well, just saying.
> > 
> > console_lock()/console_unlock() pair is different from what I want.
> > 
> > console_lock()/console_unlock() pair blocks as long as somebody else
> > is printk()ing. What I want is an API for
> > 
> >   current thread waits for N bytes to be written to console devices
> >   if current thread stored N bytes using printk(), but allow using some
> >   timeout and killable because waiting unconditionally forever is not good
> >   (e.g. current thread is expected to bail out soon if OOM-killed during
> >   waiting for N bytes to be written to console devices)
> 
> I assume you are talking here about a completely new API, not related to
> the patch in question (because your patch does not do this). right?

Right.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] printk: Add best-effort printk() buffering.
  2017-05-10 11:27           ` Tetsuo Handa
@ 2017-05-15 13:15             ` Petr Mladek
  0 siblings, 0 replies; 11+ messages in thread
From: Petr Mladek @ 2017-05-15 13:15 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sergey.senozhatsky.work, sergey.senozhatsky, linux-kernel, mingo,
	peterz, rostedt

On Wed 2017-05-10 20:27:32, Tetsuo Handa wrote:
> Sergey Senozhatsky wrote:
> > On (05/09/17 20:41), Tetsuo Handa wrote:
> > [..]
> > > > what I meant was -- "can we sleep under printk_buffered_begin() or not".
> > > > printk-safe disables local IRQs. so what I propose is something like this
> > > > 
> > > > 	printk-safe-enter    //disable local IRQs, use per-CPU buffer
> > > > 	backtrace
> > > > 	printk-safe-exit     //flush per-CPU buffer, enable local IRQs
> > > > 
> > > > except that 'printk-safe-enter/exit' will have new names here, say
> > > > printk-buffered-begin/end, and, probably, handle flush differently.
> > > 
> > > OK. Then, answer is that we are allowed to sleep after get_printk_buffer()
> > > if get_printk_buffer() is called from schedulable context because different
> > > printk_buffer will be assigned by get_printk_buffer() if get_printk_buffer()
> > > is called from non-schedulable context.
> > > 
> > > > 
> > > > 
> > > > > > hm, 16 is rather random, it's too much for UP and probably not enough for
> > > > > > a 240 CPUs system. for the time being there are 3 buffered-printk users
> > > > > > (as far as I can see), but who knows how more will be added in the future.
> > > > > > each CPU can have overlapping printks from process, IRQ and NMI contexts.
> > > > > > for NMI we use printk-nmi buffers, so it's out of the list; but, in general,
> > > > > > *it seems* that we better depend on the number of CPUs the system has.
> > > > > > which, once again, returns us back to printk-safe...
> > > > > > 
> > > > > > thoughts?
> > > > > 
> > > > > I can make 16 a CONFIG_ option.
> > > > 
> > > > but still, why use additional N buffers, when we already have per-CPU
> > > > buffers? what am I missing?
> > > 
> > > Per-CPU buffers need to disable preemption by disabling local hard
> > > IRQ / soft IRQ. But printk_buffers need not to disable preemption.
> > 
> > yes. ok. seems that I can't explain what I want.
> > 
> > my point is:
> > 
> > printk-buffered does not disable preemption and we can sleep under
> > printk-buffered-begin. fine. but why would you want to sleep there anyway?
> > you just want to print a backtrace and be done with it. and backtracing
> > does not sleep, afaiu, or it least it should not, because it must be
> > possible to dump_stack() from atomic context. so why have
> > 
> > 	printk-buffered keeps preemption and irqs enable and uses one
> > 	of aux buffers (if any).
> > 
> > instead of
> > 
> > 	printk-buffered starts an atomic section - it disables preemption
> > 	and local irqs, because it uses per-CPU buffer (which is always,
> > 	and already, there).
> > 
> > ?
> 
> Backtrace is just an example chunk where multiple lines are expected to be
> printed together. In practice, backtraces are surrounded by preceding lines
> and trailing lines. For example, an OOM killer message looks like
> 
> [   88.584836] a.out invoked oom-killer: gfp_mask=0x14280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),  order=0, oom_score_adj=0
> [   88.590706] a.out cpuset=/ mems_allowed=0
> [   88.594524] CPU: 1 PID: 2981 Comm: a.out Not tainted 4.11.0-rc7-next-20170421+ #588
> [   88.599445] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> [   88.604543] Call Trace:
> [   88.606779]  ? dump_stack+0x5c/0x7d
> [   88.608655]  ? dump_header+0x97/0x233
> [   88.610083]  ? oom_kill_process+0x223/0x3e0
> [   88.611511]  ? has_capability_noaudit+0x17/0x20
> [   88.613046]  ? oom_badness+0xeb/0x160
> [   88.614359]  ? out_of_memory+0x10b/0x490
> [   88.615592]  ? __alloc_pages_slowpath+0x701/0x8e2
> [   88.616947]  ? __alloc_pages_nodemask+0x1ed/0x210
> [   88.618267]  ? alloc_pages_vma+0x9f/0x220
> [   88.619475]  ? __handle_mm_fault+0xc22/0x11e0
> [   88.620793]  ? handle_mm_fault+0xc5/0x220
> [   88.622222]  ? __do_page_fault+0x21e/0x4b0
> [   88.623903]  ? do_page_fault+0x2b/0x70
> [   88.625235]  ? page_fault+0x28/0x30
> [   88.626392] Mem-Info:
> [   88.627314] active_anon:395525 inactive_anon:2096 isolated_anon:0
> [   88.627314]  active_file:60 inactive_file:56 isolated_file:0
> [   88.627314]  unevictable:0 dirty:7 writeback:0 unstable:0
> [   88.627314]  slab_reclaimable:2211 slab_unreclaimable:6459
> [   88.627314]  mapped:411 shmem:2162 pagetables:2418 bounce:0
> [   88.627314]  free:13040 free_pcp:54 free_cma:0
> [   88.637107] Node 0 active_anon:1582100kB inactive_anon:8384kB active_file:240kB inactive_file:224kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:1644kB dirty:28kB writeback:0kB shmem:8648kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1349632kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
> [   88.644027] Node 0 DMA free:7176kB min:408kB low:508kB high:608kB active_anon:8696kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:40kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [   88.651384] lowmem_reserve[]: 0 1696 1696 1696
> [   88.653015] Node 0 DMA32 free:44984kB min:44644kB low:55804kB high:66964kB active_anon:1573396kB inactive_anon:8384kB active_file:1268kB inactive_file:968kB unevictable:0kB writepending:28kB present:2080640kB managed:1756232kB mlocked:0kB slab_reclaimable:8844kB slab_unreclaimable:25804kB kernel_stack:5856kB pagetables:9632kB bounce:0kB free_pcp:248kB local_pcp:40kB free_cma:0kB
> [   88.661952] lowmem_reserve[]: 0 0 0 0
> [   88.663388] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (M) 2*1024kB (UM) 0*2048kB 1*4096kB (M) = 7140kB
> [   88.666629] Node 0 DMA32: 1041*4kB (UME) 616*8kB (UE) 157*16kB (UE) 95*32kB (UE) 59*64kB (UME) 61*128kB (ME) 27*256kB (UME) 15*512kB (ME) 4*1024kB (UME) 0*2048kB 0*4096kB = 44916kB
> [   88.671430] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
> [   88.674182] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [   88.676448] 2299 total pagecache pages
> [   88.678121] 0 pages in swap cache
> [   88.679472] Swap cache stats: add 0, delete 0, find 0/0
> [   88.681159] Free swap  = 0kB
> [   88.682698] Total swap = 0kB
> [   88.683996] 524157 pages RAM
> [   88.685385] 0 pages HighMem/MovableOnly
> [   88.686970] 81123 pages reserved
> [   88.688221] 0 pages cma reserved
> [   88.689831] 0 pages hwpoisoned
> [   88.691238] Out of memory: Kill process 2982 (a.out) score 999 or sacrifice child
> [   88.693403] Killed process 2982 (a.out) total-vm:4172kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
> 
> where showing memory related information involves full of pr_cont() usage.
> 
> If sysctl_oom_dump_tasks is set to 1, hundreds to a few thousands of lines
> could be added by dump_tasks() and per-CPU buffers might not be large enough
> (e.g. CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13 is only 8KB, isn't it?).

A solution might be to use separate buffers for this buffered
context but it would be great to reuse some of the printk_safe code.

This is one of my problems with this patch. It adds a lot of
non-trivial code and buffers that are handled very different
way than the others. It might be much more acceptable:

   + if it reuses some of the existing code for temporary
     buffering (from printk_safe.c)

   + if it is usable for continuous lines (transparently?) and allows
     to remove the complex code around struct cont (immediately or
     middle term)


There is one catch with the transparent solution. Using per-CPU
buffers might require disabling preemption. But normal printk()
must not have side effects. I mean that it would be quite error
prone if printk() disables interrupts when it prints a text
without a newline.

A solution might be to use the pool of buffers that might
be assigned to a task. But we still should try to reuse
the printk_safe code for writing and flushing the buffers.


> Surrounding oom_kill_process() with printk_safe_enter()/printk_safe_exit() is not
> enough, and there is no need to disable hard IRQ during oom_kill_process().

As you say, there is no need to disable hard IRQ during oom_kill_process().
But will it cause any harm? As Sergey mentioned, would you want to
sleep with some messages buffered?

One big question if we want to use separate buffers in this
situations at all. Some of these reports are often printed when
the system is in very bad state. We should try to get the messages
to the console immediately when the system is dying.


> I don't want to worry about current thread migrating to different CPUs during
> oom_kill_process() and messages stored during printk_safe_enter()/printk_safe_exit() are
> written to console devices out of order. Can the printk kernel thread guarantee ordering?

printk_safe_enter*() disables interrupts. As a result
the process cannot be migrated in this context and the ordering
is guaranteed.


Best Regards,
Petr

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-15 13:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-30 13:54 [PATCH] printk: Add best-effort printk() buffering Tetsuo Handa
2017-04-30 16:11 ` Joe Perches
2017-05-03  6:21   ` Tetsuo Handa
2017-05-03  9:30     ` Joe Perches
2017-05-08  7:05 ` Sergey Senozhatsky
2017-05-08 13:05   ` Tetsuo Handa
2017-05-09  1:04     ` Sergey Senozhatsky
2017-05-09 11:41       ` Tetsuo Handa
2017-05-10  6:21         ` Sergey Senozhatsky
2017-05-10 11:27           ` Tetsuo Handa
2017-05-15 13:15             ` Petr Mladek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.