From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751525AbdECGYD (ORCPT <rfc822;w@1wt.eu>);
        Wed, 3 May 2017 02:24:03 -0400
Received: from www262.sakura.ne.jp ([202.181.97.72]:54686 "EHLO
        www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751108AbdECGX4 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 3 May 2017 02:23:56 -0400
To: joe@perches.com, linux-kernel@vger.kernel.org
Cc: mingo@redhat.com, peterz@infradead.org, pmladek@suse.com,
        sergey.senozhatsky@gmail.com, rostedt@goodmis.org
Subject: Re: [PATCH] printk: Add best-effort printk() buffering.
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
References: <1493560477-3016-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp>
        <1493568716.1874.7.camel@perches.com>
In-Reply-To: <1493568716.1874.7.camel@perches.com>
Message-Id: <201705031521.EIJ39594.MFtOVOHSFLFOJQ@I-love.SAKURA.ne.jp>
X-Mailer: Winbiff [Version 2.51 PL2]
X-Accept-Language: ja,en,zh
Date: Wed, 3 May 2017 15:21:56 +0900
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Joe Perches wrote:
> On Sun, 2017-04-30 at 22:54 +0900, Tetsuo Handa wrote:
> > Sometimes we want to printk() multiple lines in a group without being
> > disturbed by concurrent printk() from interrupts and/or other threads.
> > For example, mixed printk() output of multiple thread's dump makes it
> > hard to interpret.
> > 
> > This patch introduces fixed-sized statically allocated buffers for
> > buffering printk() output for each thread/context in best effort
> > (i.e. up to PAGE_SIZE bytes, up to 16 concurrent printk() users).
> []
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> []
> > +#define MAX_PRINTK_BUFFERS 16
> > +static struct printk_buffer {
> > +	unsigned long context; /* printk_context() */
> > +	unsigned int nested;
> > +	unsigned int used; /* Valid bytes in buf[]. */
> > +	char buf[PAGE_SIZE];
> > +} printk_buffers[MAX_PRINTK_BUFFERS];
> 
> Perhaps these buffers could be acquired by
> alloc_page rather than be static structures and
> the sizeof buf[PAGE_SIZE] should be reduced by
> 	sizeof(unsigned long) +
> 	sizeof(unsigned int) +
> 	sizeof(unsigned int)
> so that struct printk_buffers is exactly
> PAGE_SIZE.

When should the buffers be allocated? If upon boot, there will be little
difference. If the first time each buffer is needed, we introduce a risk
of failing to allocate memory using alloc_page(GFP_ATOMIC | __GFP_NOWARN)
and a risk of stack overflow during alloc_page() because printk() has to be
prepared for being called from stack-tight situation.

Also, while dynamic allocation can allow linked list of the buffer, we
will need to introduce a lock for traversing the list, which might become
more expensive than walking fixed-sized array of the buffer.
We could avoid list traversal by passing "struct printk_buffer" argument,
but since there are so many functions which expect pr_cont() behavior,
scattering "struct printk_buffer" argument is a big patch.
We could avoid list traversal by storing "struct printk_buffer[3]" inside
"struct task_struct" (for hard IRQ context, soft IRQ context, task context),
but occupying sizeof(struct printk_buffer *) * 3 bytes only for buffers
which will be used only for a moment is not smart.

Thus, I think fixed-sized statically allocated buffers is the most
reasonable choice. Using a CONFIG_ option for controlling how many pages
should be allocated for "struct printk_buffer" might make sense for systems
with little RAM.

Tetsuo Handa wrote:
> +void get_printk_buffer(void)
> +{
> +	const unsigned long context = printk_context();
> +	int i;
> +
> +	/* No-op if called from NMI context. */
> +	if ((context & 3) == 3)
> +		return;
> +	for (i = 0; i < MAX_PRINTK_BUFFERS; i++) {
> +		struct printk_buffer *ptr = &printk_buffers[i];
> +
> +		if (ptr->context != context) {
> +			if (ptr->context ||
> +			    cmpxchg(&ptr->context, 0, context))
> +				continue;
> +			ptr->nested = 0;
> +			ptr->used = 0;
> +		} else {
> +			ptr->nested++;
> +		}
> +		break;
> +	}
> +}

Oops, I over-optimized here. We should not reserve ptr->context == 0 slot
before checking ptr->context != 0 slots, or we may race with nested calls.

void get_printk_buffer(void)
{
	const unsigned long context = printk_context();
	struct printk_buffer *ptr;
	int i;

	/* No-op if called from NMI context. */
	if ((context & 3) == 3)
		return;
	ptr = find_printk_buffer();
	if (ptr) {
		ptr->nested++;
		return;
	}
	for (i = 0; i < MAX_PRINTK_BUFFERS; i++) {
		ptr = &printk_buffers[i];
		if (ptr->context || cmpxchg(&ptr->context, 0, context))
			continue;
		ptr->nested = 0;
		ptr->used = 0;
		break;
	}
}


>  int vprintk_default(const char *fmt, va_list args)
>  {
> +	struct printk_buffer *ptr;
> +	va_list args2;
> +	unsigned int i;
>  	int r;
>  
>  #ifdef CONFIG_KGDB_KDB
> @@ -1806,6 +1960,43 @@ int vprintk_default(const char *fmt, va_list args)
>  		return r;
>  	}
>  #endif
> +	ptr = find_printk_buffer();
> +	if (!ptr)
> +		goto use_unbuffered;
> +	/*
> +	 * Try to store to printk_buffer first. If it fails, flush completed
> +	 * lines in printk_buffer, and then try to store to printk_buffer
> +	 * again. If it still fails, flush incomplete line in printk_buffer
> +	 * and use unbuffered printing.
> +	 *
> +	 * Since printk_buffer is identified by current thread and interrupt
> +	 * context and same level of context does not recurse, we don't need
> +	 * logbuf_lock_irqsave()/logbuf_unlock_irqrestore() here except
> +	 * __flush_printk_buffer().
> +	 */

Before I continue, I'd like to confirm that this assumption is correct.
My understanding is that

  (1) task context can be preempted by soft IRQ context, hard IRQ context and NMI context.
  (2) soft IRQ context can be preempted by hard IRQ context and NMI context.
  (3) hard IRQ context can be preempted by NMI context.
  (4) An kernel-oops event can interrupt task context, soft IRQ context, hard IRQ context
      and NMI context, but the interrupted context can not continue execution of
      vprintk_default() after returning from the kernel-oops event even if the
      kernel-oops event occurred in schedulable context and panic_on_oops == 0.

and as a result, it is guaranteed that an kernel-oops event which occurred between

> +	for (i = 0; i < 2; i++) {

                here

> +		unsigned int pos = ptr->used;
> +		char *text = ptr->buf + pos;
> +
> +		va_copy(args2, args);
> +		r = vsnprintf(text, sizeof(ptr->buf) - pos, fmt, args2);
> +		va_end(args2);
> +		if (r + pos < sizeof(ptr->buf)) {
> +			/*
> +			 * Eliminate KERN_CONT at this point because we can
> +			 * concatenate incomplete lines inside printk_buffer.
> +			 */
> +			if (r >= 2 && printk_get_level(text) == 'c') {
> +				memmove(text, text + 2, r - 2);
> +				ptr->used += r - 2;
> +			} else {
> +				ptr->used += r;
> +			}
> +			return r;
> +		}

                and here
shall not make ptr->used >= sizeof(ptr->buf) true.

> +		__flush_printk_buffer(ptr, i);
> +	}
> +use_unbuffered:
>  	r = vprintk_emit(0, LOGLEVEL_DEFAULT, NULL, 0, fmt, args);
>  
>  	return r;

Well, should vprintk_default() refrain from buffering if oops_in_progress == 1 ?