From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752582AbcHLJqK (ORCPT ); Fri, 12 Aug 2016 05:46:10 -0400 Received: from mx2.suse.de ([195.135.220.15]:42705 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752203AbcHLJoz (ORCPT ); Fri, 12 Aug 2016 05:44:55 -0400 Date: Fri, 12 Aug 2016 11:44:47 +0200 From: Petr Mladek To: Viresh Kumar Cc: Jan Kara , Andrew Morton , Sergey Senozhatsky , Jan Kara , Tejun Heo , Tetsuo Handa , "linux-kernel@vger.kernel.org" , Byungchul Park , Sergey Senozhatsky , vlevenetz@mm-sol.com, Greg Kroah-Hartman Subject: Re: [PATCH v10 1/2] printk: Make printk() completely async Message-ID: <20160812094447.GD7339@pathway.suse.cz> References: <1459789048-1337-1-git-send-email-sergey.senozhatsky@gmail.com> <1459789048-1337-2-git-send-email-sergey.senozhatsky@gmail.com> <20160404155149.a3e3307def2d1315e2099c63@linux-foundation.org> <20160406082758.GA3554@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 2016-08-10 14:17:55, Viresh Kumar wrote: > +Vladi/Greg, > > On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara wrote: > > On Mon 04-04-16 15:51:49, Andrew Morton wrote: > > >> > +static int __init init_printk_kthread(void) > >> > +{ > >> > + struct task_struct *thread; > >> > + > >> > + if (printk_sync) > >> > + return 0; > >> > + > >> > + thread = kthread_run(printk_kthread_func, NULL, "printk"); > >> > >> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO > >> task will block printk for ever. This seems bad. > > > > I have to research this a bit but won't the SCHED_FIFO task that has > > potentially unbounded amount of work lockup the CPU even though it does > > occasional cond_resched()? > > We are facing complete hogs because of the printk thread being a SCHED_FIFO > task and have this patch to fix it up for now. > > Author: Vladislav Levenetz > Date: Wed Aug 10 13:58:00 2016 -0700 > > SW-7786: printk: Lower the priority of printk thread > > Flooding the console (with a test module) in a tight loop indefinitely > makes android user interface very sluggish. Opening YouTube app and the > device hangs and becomes even more unresponsive to the point it > completely hangs. > > The asynchronous printk thread is a SCHED FIFO thread with priority > MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO) > instead, we observe much better performance using the same printk flood > test. We don't even notice any kind of sluggishness during device usage. > We can play a YouTube clip smoothly and use the device normally in > general. The kernel log looks fine as well, as the flood of messages > continue normally. > > Signed-off-by: Vladislav Levenetz > Signed-off-by: Viresh Kumar > --- > kernel/printk/printk.c | 4 ---- > 1 file changed, 4 deletions(-) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index c32872872cb6..ad5b30e5e6d9 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data) > static int __init_printk_kthread(void) > { > struct task_struct *thread; > - struct sched_param param = { > - .sched_priority = MAX_RT_PRIO - 1, > - }; > > if (!printk_kthread_can_run || printk_sync || printk_kthread) > return 0; > @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void) > return PTR_ERR(thread); > } > > - sched_setscheduler(thread, SCHED_FIFO, ¶m); > printk_kthread = thread; > return 0; > } IMHO, this is fine. We force the synchronous mode in critical situations anyway. But I was curious if we could hit a printk from the wake_up_process(). The change above causes using the fair scheduler and there is the following call chain [*] vprintk_emit() -> wake_up_process() -> try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_activate() -> activate_task() -> enqueue_task() -> enqueue_task_fair() via p->sched_class->enqueue_task -> cfs_rq_of() -> task_of() -> WARN_ON_ONCE(!entity_is_task(se)) We should never trigger this because printk_kthread is a task. But what if the date gets inconsistent? Then there is the following chain: vprintk_emit() -> wake_up_process() -> try_to_wake_up() -> ttwu_queue() -> ttwu_do_activate() -> ttwu_activate() -> activate_task() -> enqueue_task() -> enqueue_task_fair() via p->sched_class->enqueue_task ->hrtick_update() -> hrtick_start_fair() -> WARN_ON(task_rq(p) != rq) This looks like another paranoid consistency check that might be triggered when the scheduler gets messed. I see few possible solutions: 1. Replace the WARN_ONs by printk_deferred(). This is the usual solution but it would make debugging less convenient. 2. Force synchronous printk inside WARN()/BUG() macros. This would make sense even from other reasons. These are printed when the system is in a strange state. There is no guarantee that the printk_kthread will get scheduled. 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU printk_func. It might be elegant. But we do not want this outside the scheduler code. Therefore we would need special variants of WARN_*_SCHED() BUG_*_SCHED() macros. I personally prefer the 2nd solution. What do you think about it, please? Best Regards, Petr