From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752253AbdGDODV (ORCPT <rfc822;w@1wt.eu>);
        Tue, 4 Jul 2017 10:03:21 -0400
Received: from mx2.suse.de ([195.135.220.15]:41970 "EHLO mx1.suse.de"
        rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP
        id S1750872AbdGDODU (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 4 Jul 2017 10:03:20 -0400
Date: Tue, 4 Jul 2017 16:03:16 +0200
From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
        Steven Rostedt <rostedt@goodmis.org>, Jan Kara <jack@suse.cz>,
        Andrew Morton <akpm@linux-foundation.org>,
        Peter Zijlstra <peterz@infradead.org>,
        "Rafael J . Wysocki" <rjw@rjwysocki.net>,
        Eric Biederman <ebiederm@xmission.com>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Jiri Slaby <jslaby@suse.com>, Pavel Machek <pavel@ucw.cz>,
        Andreas Mohr <andi@lisas.de>,
        Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
        linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
Message-ID: <20170704140316.GK23069@pathway.suse.cz>
References: <20170509082859.854-1-sergey.senozhatsky@gmail.com>
 <20170509082859.854-3-sergey.senozhatsky@gmail.com>
 <20170510055935.GA1966@jagdpanzerIV.localdomain>
 <20170529092906.GD21894@pathway.suse.cz>
 <20170531072233.GC7672@jagdpanzerIV.localdomain>
 <20170628121925.GN1538@pathway.suse.cz>
 <20170629073321.GA475@jagdpanzerIV.localdomain>
 <20170630115457.GE23069@pathway.suse.cz>
 <20170630124224.GA792@jagdpanzerIV.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170630124224.GA792@jagdpanzerIV.localdomain>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri 2017-06-30 21:42:24, Sergey Senozhatsky wrote:
> Hello,
> 
> On (06/30/17 13:54), Petr Mladek wrote:
> > > but.....
> > > the opposite possibility is that messages either won't be printed
> > > soon (until next printk or console_unlock()) or won't be printed
> > > ever at all (in case of sudden system death). I don't think it's
> > > a good alternative.
> > 
> > I see it like a weighing machine. There is a "guaranteed" output on
> > one side and a softlockups prevention on the other side. The more
> > we prevent the softlockups the less we guarantee the output.
> 
> I apply a very simple litmus test. if the answer to the question
> "so we leave console_unlock() and there are pending messages,
> who and when is going to flush the remaining messages?" is
> "something sometime in the future" then it's a no-no.
>
> "something sometime in the future" is equal to "no one".

I am afraid that it is hard or even impossible to have offloading
and be sure that the other side continues. IMHO, there always
will be a possibility that something could go wrong.

I think that resisting on keeping the current guarantees might
force use to keep the current code (status quo).

Please, let me better explain the problems that I see with
the current code.


> > My main unresolved doubts about this patchset are:
> > 
> > 1. It gives other processes only very small change to take
> >    over the job. They either need to call console_trylock()
> >    in very small "race" window or they need to call
> >    console_lock(). Where console_lock() only signalizes
> >    that the caller is willing to take the job and puts
> >    him into sleep.
> 
> printk_kthread does console_lock().

But this is not enough. In fact, it makes things worse. Let me
explain:

We are talking about flood of messages when we go over
atomic_print_limit and activate offloading. Let's look
at it in more details:

CPU0				CPU1


console_unlock()

  // handle atomic_print_limit lines
  // enable offloading

  wake_up_process(printk_kthread)

  // handly many other lines

				// printk_kthread finally gets CPU
				// waken in printk_kthread_func()

				console_lock()
				  down()
				    // lock most likely taken by CPU0
				    list_add_tail(&waiter.list)
				    schedule()

   // handle one more line
   up_console_sem()
     list_del(&waiter.list)
     wake_up_process(waiter->task)

   console_trylock()
   // fails because the lock is
   // taken by the waiter

Regression 1: Nobody flushes the messages to the console until
	the printk_kthread() is scheduled again, really takes
	the console_lock(), and calls console_unlock().

	The original code flushed the messages
	all the time until all were flushed. (Except
	if it was allowed to sleep in console_unlock).


Regression 2: printk_kthread() takes console_lock() in
	   preemtible context => is allowed to sleep
	   and nobody is able to get console_lock
	   in the meantime => slowdown and risk of
	   lost messages.


In addition, the printk kthread need not be scheduled
at all when:

  + It is assigned to the same CPU and the current
    console_lock() owner.

  + It is assigned to a CPU where some higher priority
    task got mad.


Finally, it might be scheduled too late when we already
handled too many lines over the atomic limit and switched
to the emergency mode.

How big should be atomic limit to give the kthread chance
to get scheduled in time? It might be pretty complicated
mathematics. It will depend on the hardware, system
load, used scheduler, name spaces limitations, ...


> we may (and need to) improve the retry path in console_unlock().
> but we must not leave it until other process locks the console_sem.

Several possibilities came to my mind but they all have problems.
Let me explain.


1. Make printk_kthread() special and avoid sleeping with
   console_lock() taken => force offload when need_resched
   gets true. But this still keeps regression 1.

   Also it means that we are offloading without guarantees.
   But the same is true for sleeping with console_lock.


2. Increase the chance that other printk() callers will take
   over console_lock():

     + Add delay before console_trylock() in console_unlock().
       But this is very ugly. Waste of resources => no way


     + console_unlock() might signalize offloading and
       printk() caller might call console_lock() instead
       of console_trylock().

       It must be only one printk() caller (first one?). It does
       not make sense to block all CPUs and the entire system.

       Anyway, console_lock() might sleep and printk() does not know
       the context => no way

       The only chance is that we combine this with throttling
       tasks that do too many printk() calls. But it is still
       ugly and hard to predict.


     + Replace console_lock() with spin lock.
       But this would require complete rework/clean up
       => task for years

Did I miss a nice trick?


> >    Another detailed description of this problem can be found
> >    in my previous mail, see
> >    https://lkml.kernel.org/r/20170628121925.GN1538@pathway.suse.cz
> > 
> > 
> > 2. It adds rather complex dependency on the scheduler. I know
> >    that my simplified solution do this as well but another way[*]
> >    Let me explain. I would split the dependency on the code
> >    and behavior relation.
> > 
> >    From the code side: The current printk() calls wake_up_process()
> >    and we need to use printk_deferred() in the related scheduler code.
> >    This patch does this as well, so there is no win and no lose.
> >    Well, you talk about changing the affinity and other tricks
> >    in the other mails. This might add more locations where
> >    printk_deferred() would be needed.
> 
> we are in printk_safe all the way through console_offload_printing(),
> the context is derived from console_unlock().
> 
> why we would need printk_deferred()?

I think that the reason is the same why we still need
printk_deferred() now.

printk_safe() helps only when the recursion starts inside
printk(). It fails when printk() is called from a function
that is used inside printk().


> >    From the behavior side: The current code wakes the process
> >    and is done. The code in this patch wakes the process and
> >    waits until it[**] gets CPU and really runs. It switches to
> >    the emergency mode when the other process does not run in time.
> >    By other words, this patch depends on more actions done
> >    by the scheduler and changes behavior based on it. IMHO,
> >    this will be very hard to code, tune, and debug.
> >    A proper solution might require more code dependency.
> > 
> >    [*] My solution depends on the scheduler in the sense
> >        that messages will get lost when nobody else will take
> >        over the console job.
> 
> which is precisely and exactly the thing that we should never
> let to happen. there is no _win_, because we _lost_ the messages.

I do not see a way how to avoid this at the moment. Therefore I
suggest changes that might reduce the possible damage as much
as possible:

   + reduce sleeping with console_lock() => this will actually
     increase the change to flush the messages in panic()


   + use high enough atomic_print_limit so that we prevent
     blocking in atomic context for seconds but that we trigger
     offloading only in very rare situations

     Maybe we could use a number in secons for the limit
     instead of number of lines.

     Note that I still consider flood of messages as a broken
     system. If anyone gets into the situation and lose messages,
     he should fix/reduce the flood first. The atomic limits
     should flush enought messages to do so


   + allow to throttle printk() caller that cause the flood;
     this looks like a win-win solution; it handles the root
     of the problem

   + eventually write on the console that offloading
     has happended, so that we could see when it failed

   + add the early printk stuff from Peter Zijstra so that
     developers have another chance to debug these problems

     This is needed already now when printk() fails.


Does this really sound that bad?

Of course, there might be situations where printk() might behave
worse. But we still should allow to disable offloading completely
for this cases.

Well, I admit that it is hard to be sure. printk() is used
in too many situations. But we either want to do something
or we could just keep the status quo.


> 
> > 3. The prevention of soft-lockups is questionable. If you are in
> >    soft-lockup prone situation, the only prevention is to do an
> >    offload. But if you switch to the emergency mode and give
> >    up offloading too early, the new code stops preventing
> >    the softlockup.
> > 
> >    Of course, the patchset does not make it worse but the question
> >    is how much it really helps. It would be bad to add a lot of
> >    code/complexity with almost no gain.
> > 
> > 
> > IMHO, if we try to solve the 1st problem (chance of offloading),
> > it might add a risk of deadlocks and/or make the 2nd problem
> > (dependency on scheduler) worse. Also I am afraid that we would
> > repeat many dead ways already tried by Jan Kara.
> 
> what deadlock?

The real risk semms to be a missing printk_deferred()
needed in a newly used code.

I had also some wrongly done hand-shake of console_lock that
cause several deadlocks. For example, we might wait for someone in
console_unlock() who will never be scheduled. Or we will
start waiting in printk() for console_lock() but there
might be already another waiters in the list. There might
be some dependecy between them. OK, this is rather
theoreticall.

Anyway, using more code in printk() might cause recursion
and deadlocks. So, we need to be careful.

Uff, all this is complicated. I wish I managed to explain
this more easily.

Best Regards,
Petr

PS: I am going to be off until the next Monday. It might take
some time until I process all the other mails. I try hard to
sort the ideas and keep some clear view.