From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932451AbcI2M0u (ORCPT <rfc822;w@1wt.eu>);
        Thu, 29 Sep 2016 08:26:50 -0400
Received: from mx2.suse.de ([195.135.220.15]:50455 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1754308AbcI2M0m (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 29 Sep 2016 08:26:42 -0400
Date: Thu, 29 Sep 2016 14:26:40 +0200
From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jan Kara <jack@suse.cz>, Andrew Morton <akpm@linux-foundation.org>,
        Tejun Heo <tj@kernel.org>, Calvin Owens <calvinowens@fb.com>,
        linux-kernel@vger.kernel.org,
        Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Subject: Re: [RFC][PATCH 3/7] printk: introduce per-cpu alt_print seq buffer
Message-ID: <20160929122639.GD26796@pathway.suse.cz>
References: <20160927142237.5539-1-sergey.senozhatsky@gmail.com>
 <20160927142237.5539-4-sergey.senozhatsky@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160927142237.5539-4-sergey.senozhatsky@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue 2016-09-27 23:22:33, Sergey Senozhatsky wrote:
> This patch extends the idea of NMI per-cpu buffers to regions
> that may cause recursive printk() calls and possible deadlocks.
> Namely, printk() can't handle printk calls from schedule code
> or printk() calls from lock debugging code (spin_dump() for instance);
> because those may be called with `sem->lock' already taken or any
> of critical scheduler locks (p->pi_lock, etc.). An example of
> a deadlock can be
> 
>  vprintk_emit()
>   console_unlock()
>    up()                        << raw_spin_lock_irqsave(&sem->lock, flags);
>     wake_up_process()
>      try_to_wake_up()
>       ttwu_queue()
>        ttwu_do_activate()
>         ttwu_activate()
>          activate_task()
>           enqueue_task()
>            enqueue_task_fair()
>             cfs_rq_of()
>              task_of()
>               WARN_ON_ONCE(!entity_is_task(se))
>                vprintk_emit()
>                 console_trylock()
>                  down_trylock()
>                   raw_spin_lock_irqsave(&sem->lock, flags)
>                   ^^^^ deadlock
> 
> and some other cases.
> 
> Just like in NMI implementation, the solution uses a per-cpu
> `printk_func' pointer to 'redirect' printk() calls to a 'safe'
> callback, that store messages in a per-cpu buffer and flushes
> them back to logbuf buffer later.
> 
> Usage example:
> 
>  printk()
>   local_irq_save()
>   alt_printk_enter()

We need to make sure that exit() is called on the same CPU.
Therefore we need to disable preemption as well.

>   //
>   //  any printk() call from here will endup in vprintk_alt(),
>   //  that stores messages in a special per-CPU buffer.
>   //
>   alt_printk_exit()
>   local_irq_restore()
> 
> The patch only adds a alt_printk support, we don't use it yet.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---
>  kernel/printk/alt_printk.c | 160 +++++++++++++++++++++++++++++++++++++++++----
>  kernel/printk/internal.h   |  12 ++++
>  2 files changed, 158 insertions(+), 14 deletions(-)
> 
> diff --git a/kernel/printk/alt_printk.c b/kernel/printk/alt_printk.c
> index 7178661..8978390 100644
> --- a/kernel/printk/alt_printk.c
> +++ b/kernel/printk/alt_printk.c
> @@ -1,5 +1,5 @@
>  /*
> - * alt_printk.c - Safe printk in NMI context
> + * alt_printk.c - Safe printk for printk-deadlock-prone contexts
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
> @@ -53,24 +53,39 @@ struct alt_printk_seq_buf {
>  static DEFINE_PER_CPU(struct alt_printk_seq_buf, nmi_print_seq);
>  
>  /*
> - * Safe printk() for NMI context. It uses a per-CPU buffer to
> - * store the message. NMIs are not nested, so there is always only
> - * one writer running. But the buffer might get flushed from another
> - * CPU, so we need to be careful.
> + * There can be two alt_printk contexts at most - a `normal' alt_printk
> + * and NMI alt_printk context. Normal alt_printk context is the one that
> + * direct caller of printk() setups (either a process or IRQ) and it can
> + * be preempted only by NMI (if the platform supports NMI). NMI context
> + * can preempt normal alt_printk context, but cannot be preempted on its
> + * own.
>   */
> -static int vprintk_nmi(const char *fmt, va_list args)
> +#ifdef CONFIG_PRINTK_NMI
> +#define MAX_ALT_PRINTK_CTX	2
> +#else
> +#define MAX_ALT_PRINTK_CTX	1
> +#endif
> +
> +struct alt_printk_ctx {
> +	atomic_t	idx;
> +	unsigned int	entry_count;
> +	printk_func_t	saved_printk_func[MAX_ALT_PRINTK_CTX];
> +};
> +
> +static DEFINE_PER_CPU(struct alt_printk_seq_buf, alt_print_seq);
> +static DEFINE_PER_CPU(struct alt_printk_ctx, alt_printk_ctx);
> +
> +static int alt_printk_log_store(struct alt_printk_seq_buf *s,
> +		const char *fmt, va_list args)
>  {
> -	struct alt_printk_seq_buf *s = this_cpu_ptr(&nmi_print_seq);
> -	int add = 0;
> +	int add;
>  	size_t len;
>  
>  again:
>  	len = atomic_read(&s->len);
>  
> -	if (len >= sizeof(s->buffer)) {
> -		atomic_inc(&nmi_message_lost);
> +	if (len >= sizeof(s->buffer))
>  		return 0;
> -	}
>  
>  	/*
>  	 * Make sure that all old data have been read before the buffer was
> @@ -99,6 +114,96 @@ static int vprintk_nmi(const char *fmt, va_list args)
>  	return add;
>  }
>  
> +/*
> + * Lockless printk(), to avoid deadlocks should the printk() recurse
> + * into itself. It uses a per-CPU buffer to store the message, just like
> + * NMI.
> + */
> +static int vprintk_alt(const char *fmt, va_list args)
> +{
> +	struct alt_printk_seq_buf *s = this_cpu_ptr(&alt_print_seq);
> +
> +	return alt_printk_log_store(s, fmt, args);
> +}
> +
> +/*
> + * We must keep the track of `printk_func' because alt_printk
> + * context can be preempted by NMI alt_printk context.
> + *
> + * Consider the following example:
> + *
> + * vprintk_emit()
> + * 	alt_printk_enter()
> + * 		printk_func = vprintk_alt;
> + *
> + * -> NMI
> + *  	printk_nmi_enter()
> + *		printk_func = vprintk_nmi;
> + *	printk_nmi_exit()
> + *		printk_func = vprintk_default;
> + *		^^^^^^^^^^^
> + * <- NMI
> + *
> + * 	printk("foo") -> vprintk_default();
> + *
> + * Thus we must restore the orignal `printk_func' value, the one
> + * NMI saw at printk_nmi_enter() time.
> + */
> +static void __lockless_printk_enter(printk_func_t new_func)
> +{
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
> +	int idx = atomic_inc_return(&ctx->idx) - 1;
> +
> +	ctx->saved_printk_func[idx] = this_cpu_read(printk_func);
> +	this_cpu_write(printk_func, new_func);
> +}
> +
> +static void __lockless_printk_exit(void)
> +{
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
> +	int idx = atomic_read(&ctx->idx) - 1;
> +
> +	this_cpu_write(printk_func, ctx->saved_printk_func[idx]);
> +	atomic_dec(&ctx->idx);
> +}
> +
> +/* Local IRQs must be disabled; can be preempted by NMI. */
> +void alt_printk_enter(void)
> +{
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
> +
> +	/*
> +	 * We can't switch `printk_func' while the CPU is flushing its
> +	 * alternative buffers. At the same time, we leave flushing
> +	 * `unprotected', because we always use vprintk_default() there.
> +	 *
> +	 * ->entry_count can detect printk() recursion from flushing context:
> +	 *  -- alt_printk_flush() sets ->entry_count to 1
> +	 *  -- every vprintk_default() call from alt_printk_flush() increments
> +	 *     ->entry_count to 2 when it enters the recursion un-safe region
> +	 *     and decrements it back to 1 when it leaves that region
> +	 *  -- thus, if printk() will recurs from recursion un-safe region, we
> +	 *     will see ->entry_count > 2.
> +	 */
> +	ctx->entry_count++;
> +	if (ctx->entry_count > 1)
> +		return;
> +
> +	/* @TODO: do something sensible in case of printk() recursion */
> +
> +	__lockless_printk_enter(vprintk_alt);
> +}
> +
> +/* Local IRQs must be disabled; can be preempted by NMI. */
> +void alt_printk_exit(void)
> +{
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
> +
> +	if (ctx->entry_count == 1)
> +		__lockless_printk_exit();
> +	ctx->entry_count--;
> +}

Huh, this is complicated like hell. I think that it is like this
because of two reasons. First, it tries to change more values a
lockless way. Second, it tries to be kind of generic.

Note that it is not much generic. idx is increased or decreased only
when entry_count == 1 or when called from nmi_enter()/exit().
Therefore it could not be easily extended to yet another alt buffer.
Also it relies on the fact the nmi_enter()/exit() calls are not
nested.

What do you think about my approach with the printk_context per-CPU
value from the WARN_DEFERRED() patchset? The main idea is that
the entry()/exit() functions manipulate preempt_count-like per-CPU
variable. The printk() function selects the safe implementation
according to the current state.

>  static void alt_printk_flush_line(const char *text, int len)
>  {
>  	/*
> @@ -110,7 +215,6 @@ static void alt_printk_flush_line(const char *text, int len)
>  		printk_deferred("%.*s", len, text);
>  	else
>  		printk("%.*s", len, text);
> -
>  }
>  
>  /*
> @@ -135,6 +239,7 @@ static void __alt_printk_flush(struct irq_work *work)
>  		__RAW_SPIN_LOCK_INITIALIZER(read_lock);
>  	struct alt_printk_seq_buf *s = container_of(work,
>  			struct alt_printk_seq_buf, work);
> +	struct alt_printk_ctx *ctx = this_cpu_ptr(&alt_printk_ctx);
>  	unsigned long flags;
>  	size_t len, size;
>  	int i, last_i;
> @@ -147,6 +252,11 @@ static void __alt_printk_flush(struct irq_work *work)
>  	 * a backtrace.
>  	 */
>  	raw_spin_lock_irqsave(&read_lock, flags);
> +	/*
> +	 * Forbid the alt_printk on this CPU, we want to flush messages to
> +	 * logbuf, not to alt_printk buffer again.
> +	 */
> +	ctx->entry_count++;

This looks very strange. If entry_count is not zero, it means that we
are not in a safe context to flush the alternative buffer. Or do
I miss something, please?
 

My other concern about this approach was that it would spread printk()
messages to even more buffers. But I am not scared anymore. The
new buffer will be used only for printk-internal errors that would
normally cause a deadlock. The important thing is that we are able
to share the implementation handling the extra buffer.

So, I think that this approach have a chance if we clean it a bit.

Best Regards,
Petr