Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread

From: Petr Mladek <pmladek@suse.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J . Wysocki" <rjw@rjwysocki.net>,
	Eric Biederman <ebiederm@xmission.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jiri Slaby <jslaby@suse.com>, Pavel Machek <pavel@ucw.cz>,
	Andreas Mohr <andi@lisas.de>,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread
Date: Fri, 30 Jun 2017 13:54:57 +0200	[thread overview]
Message-ID: <20170630115457.GE23069@pathway.suse.cz> (raw)
In-Reply-To: <20170629073321.GA475@jagdpanzerIV.localdomain>

On Thu 2017-06-29 16:33:22, Sergey Senozhatsky wrote:
> On (06/28/17 14:19), Petr Mladek wrote:
> [..]
> > > at the same we have better guarantees.
> > > we don't just wakeup(printk_kthread) and leave. we wait for any other
> > > process to re-take the console_sem. until this happens we can't leave
> > > console_unlock().
> > 
> > And this is my problem. I am scared of the waiting. It is very hard
> > to predict, especially without RT priority. But it is tricky anyway,
> > see above.
> 
> but.....
> the opposite possibility is that messages either won't be printed
> soon (until next printk or console_unlock()) or won't be printed
> ever at all (in case of sudden system death). I don't think it's
> a good alternative.

I see it like a weighing machine. There is a "guaranteed" output on
one side and a softlockups prevention on the other side. The more
we prevent the softlockups the less we guarantee the output.

We do not have the same opinion about the balance. My solution
completely prevents softlockups. Your one tries to be more
conservative. It might look that a compromise is better but
we need to consider how it is achieved, what the effect
and side-effects are.

My main unresolved doubts about this patchset are:

1. It gives other processes only very small change to take
   over the job. They either need to call console_trylock()
   in very small "race" window or they need to call
   console_lock(). Where console_lock() only signalizes
   that the caller is willing to take the job and puts
   him into sleep.

   Another detailed description of this problem can be found
   in my previous mail, see
   https://lkml.kernel.org/r/20170628121925.GN1538@pathway.suse.cz

2. It adds rather complex dependency on the scheduler. I know
   that my simplified solution do this as well but another way[*]
   Let me explain. I would split the dependency on the code
   and behavior relation.

   From the code side: The current printk() calls wake_up_process()
   and we need to use printk_deferred() in the related scheduler code.
   This patch does this as well, so there is no win and no lose.
   Well, you talk about changing the affinity and other tricks
   in the other mails. This might add more locations where
   printk_deferred() would be needed.

   From the behavior side: The current code wakes the process
   and is done. The code in this patch wakes the process and
   waits until it[**] gets CPU and really runs. It switches to
   the emergency mode when the other process does not run in time.
   By other words, this patch depends on more actions done
   by the scheduler and changes behavior based on it. IMHO,
   this will be very hard to code, tune, and debug.
   A proper solution might require more code dependency.

   [*] My solution depends on the scheduler in the sense
       that messages will get lost when nobody else will take
       over the console job. The logic is simple, no scheduler
       => only atomic_print_limit messages. It might sound
       drastic but see below. The main win is that it is "simple".

   [**] It is enough when any other process takes over the
       console_lock. But this is tricky, see my 1st point above.

3. The prevention of soft-lockups is questionable. If you are in
   soft-lockup prone situation, the only prevention is to do an
   offload. But if you switch to the emergency mode and give
   up offloading too early, the new code stops preventing
   the softlockup.

   Of course, the patchset does not make it worse but the question
   is how much it really helps. It would be bad to add a lot of
   code/complexity with almost no gain.

IMHO, if we try to solve the 1st problem (chance of offloading),
it might add a risk of deadlocks and/or make the 2nd problem
(dependency on scheduler) worse. Also I am afraid that we would
repeat many dead ways already tried by Jan Kara.

If you will try to improve 3rd problem and make some guaranties
of the soft-lockup prevention, it would make the 2nd problem
(dependency on scheduler) worse. Also the code might be
very hard to understand and tune.

This is why I look for a rather simple solution. IMHO, we both
agree that:

   +  the offload will be activated only when there is
      a flood of messages

   + the only reason to wait for the other handler is to
     better handle sudden death where panic() is not called.

IMHO, the only one who brought the problem of sudden death
was Pavel Machek. AFAIK, he works on embedded systems and
hardware enablement. I guess the combination of the flood
of messages and sudden death is rare there. Also I doubt
that the current code handle it well. The flood is badly
handed in general. In each case, I wonder how long we could
survive flushing messages when there is sudden death
and scheduling does not work.

One problem here is that some questions/doubts are hard to
answer/prove without wide testing.

A compromise might be to start with the simple code
and disable the offloading by default. I am sure that
there will be volunteers that would want to play with it,
e.g. Tetsuo. We would enable it in SUSE as well because
there should not be any regression against what we have
used for years now. We could make it always more complex
according to the feedback and eventually enable it
by default.

Best Regards,
Petr