From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1759700AbcHaJjr (ORCPT <rfc822;w@1wt.eu>);
        Wed, 31 Aug 2016 05:39:47 -0400
Received: from mx2.suse.de ([195.135.220.15]:36206 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1758992AbcHaJii (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 31 Aug 2016 05:38:38 -0400
Date: Wed, 31 Aug 2016 11:38:10 +0200
From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jan Kara <jack@suse.cz>,
        Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
        Viresh Kumar <viresh.kumar@linaro.org>,
        Andrew Morton <akpm@linux-foundation.org>, Jan Kara <jack@suse.com>,
        Tejun Heo <tj@kernel.org>,
        Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Byungchul Park <byungchul.park@lge.com>, vlevenetz@mm-sol.com,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH v10 1/2] printk: Make printk() completely async
Message-ID: <20160831093810.GE4554@pathway.suse.cz>
References: <20160818105629.GE26194@pathway.suse.cz>
 <20160819063236.GA584@swordfish>
 <20160819095455.GR13300@pathway.suse.cz>
 <20160819190007.GA8275@quack2.suse.cz>
 <20160820052430.GA695@swordfish>
 <20160822041520.GA511@swordfish>
 <20160825210959.GA2273@dhcp128.suse.cz>
 <20160826015641.GA520@swordfish>
 <20160830092912.GP4866@pathway.suse.cz>
 <20160831023134.GA390@swordfish>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160831023134.GA390@swordfish>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed 2016-08-31 11:31:35, Sergey Senozhatsky wrote:
> On (08/30/16 11:29), Petr Mladek wrote:
> > > you didn't miss anything, I think I wasn't too descriptive and that caused
> > > some confusion. this patch is not a replacement of wake_up_process() patch
> > > posted earlier in the loop, but an addition to it. not only every WARN/BUG
> > > issued from wake_up_process() will do no good, but every lock we take is
> > > potentially dangerous as well. In the simplest case because of $LOCK-debug.c
> > > files in kernel/locking (spin_lock in our case); in the worst case --
> > > because of WARNs issued by log_store() and friends (there may be custom
> > > modifications) or by violations of spinlock atomicity requirements.
> > > 
> > > For example,
> > > 
> > > 	vprintk_emit()
> > > 		local_irq_save()
> > > 		raw_spin_lock()
> > > 		text_len = vscnprintf(text, sizeof(textbuf), fmt, args)
> > > 		{
> > > 			vsnprintf()
> > > 			{
> > > 				if (WARN_ON_ONCE(size > INT_MAX))
> > > 					return 0;
> > > 			}
> > > 		}
> > > 		...
> > > 
> > > this is a rather unlikely event, sure, there must be some sort of
> > > memory corruption or something else, but the thing is -- if it will
> > > happen, printk() will not be willing to help.
> > > 
> > > wake_up_process() change, posted earlier, is using a deferred version of
> > > WARN macro, but we definitely can (and we better do) switch to lockless
> > > alternative printk() in both cases and don't bother with new macros.
> > > replacing all of the existing ones with 'safe' deferred versions is
> > > a difficult task, but keeping track of a newly introduced ones is even
> > > harder (if possible at all).
> > 
> > I see. It makes some sense. I would like to be on the safe side. I am
> > just afraid that adding yet another per-CPU buffer is too complex.
> > It adds quite some complexity to the code. And it even more scatters
> > the messages so that it will be harder to get them from the
> > crash dump or flush them to the console when the system goes down.
> > 
> > It took few years to get in the solution for NMIs even when
> > it fixed real life deadlocks for many people and customers.
> > I am afraid that it is not realistic to get in similar complex
> > code to fix rather theoretical problems.
> 
> well, I still can try it in my spare time. we can't fix printk() without
> ever touching it, can we? so far we basically only acknowledge the
> existing printk() problems. we can do better than that, I think.

Ah, I do not want to discourage you from finding a solution for these
problems. I just wanted to point out problems with this particular
path of thinking (more per-CPU buffers, shuffling data between
them and the main buffer and console). But I might be wrong.

Sigh, there are many problems with printk(). I think the we recently
discussed the following problems:

  1. Hung task or blocked irq handler when preemption/irqs
     are disabled and there are too many messages pushed to
     the console.

  2. Potential deadlocks when calling wake_up_process() by
     async printk and console_unlock().

  3. Clean up the console handling to split manipulation with
     consoles setting and pushing the messages. By other words,
     allow to push the console messages only when wanted.

  4. Messed output with continuous lines.


They are related but only partly. IMHO, it is not realistic to
fix all the problems in a single patchset. I wonder how to move
forward.

Our primary target was to solve the 1st problem with the async printk.
It has stalled because we hit the other areas. Let's look at them
from this point of view.


Ad 2. The potential deadlock with wake_up_process(). It pooped up
      with using async printk during the suspend.

      But it is not new! up() called by console_unlock() has the
      same problem. I thought that it was different because
      console_trylock() would prevent recursion but I was wrong.
      There seems to be similar deadlock:

      console_unlock()
	up_console_sem()
	  up()
	    __up()
	      raw_spin_lock_irqsave(&sem->lock, flags);
	      wake_up_process()
		WARN()
		  printk()
		    vprintk_emit()
		      console_trylock()
			down_trylock_console_sem()
			  __down_trylock_console_sem)()
			    down_trylock()
			      raw_spin_lock_irqsave(&sem->lock, flags);

      DEADLOCK: because sem->lock is already taken by __up()

      We hit the deadlock quickly when the printk kthread used SCHED_FIFO.
      But it must be almost impossible to hit it (hit a WARN() with
      the normal scheduler. Otherwise, people would hit it also by
      console_unlock().

      => If we use normal scheduler for the printk() thread, we should
      be on the safe side. The deadlock should get fixed but it
      will be enough to fix it later separately.


Ad 3. The clean up of the console code might change some things. But it
      it is a huge task with unclear output. Anyway, we want to push
      messages to the console from printk(), so it will not help
      to avoid the async printk. It is a completely separate problem
      from my point of view.


Ad 4. The problems with messaged continues lines got a bit more
      visible with the async printk.

      It is because a partially flushed cont buffer is blocked
      until the rest of the line is flushed. This happens
      more likely when the messages are flushed by a separate
      process. I have proposed a fix that read the end of the
      partially printed line from the main ring buffer. It
      made the code easier and might be a low risk change.

      In each case, I would not mix it with a perfect solution
      of the continuous lines. IMHO, it would be too complex
      and it is not worth it.

Did I miss anything?

I wonder how to separate the problems and make them more manageable.

Best Regards,
Petr