From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752958AbcHOOg7 (ORCPT ); Mon, 15 Aug 2016 10:36:59 -0400 Received: from ns.mm-sol.com ([37.157.136.199]:40319 "EHLO extserv.mm-sol.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752646AbcHOOg6 (ORCPT ); Mon, 15 Aug 2016 10:36:58 -0400 X-Greylist: delayed 599 seconds by postgrey-1.27 at vger.kernel.org; Mon, 15 Aug 2016 10:36:57 EDT Subject: Re: [PATCH v10 1/2] printk: Make printk() completely async To: Petr Mladek , Viresh Kumar References: <1459789048-1337-1-git-send-email-sergey.senozhatsky@gmail.com> <1459789048-1337-2-git-send-email-sergey.senozhatsky@gmail.com> <20160404155149.a3e3307def2d1315e2099c63@linux-foundation.org> <20160406082758.GA3554@quack.suse.cz> <20160812094447.GD7339@pathway.suse.cz> Cc: Jan Kara , Andrew Morton , Sergey Senozhatsky , Jan Kara , Tejun Heo , Tetsuo Handa , "linux-kernel@vger.kernel.org" , Byungchul Park , Sergey Senozhatsky , Greg Kroah-Hartman From: Vladislav Levenetz Message-ID: <57B1D12A.6030106@mm-sol.com> Date: Mon, 15 Aug 2016 17:26:50 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160812094447.GD7339@pathway.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/12/2016 12:44 PM, Petr Mladek wrote: > On Wed 2016-08-10 14:17:55, Viresh Kumar wrote: >> +Vladi/Greg, >> >> On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara wrote: >>> On Mon 04-04-16 15:51:49, Andrew Morton wrote: >>>>> +static int __init init_printk_kthread(void) >>>>> +{ >>>>> + struct task_struct *thread; >>>>> + >>>>> + if (printk_sync) >>>>> + return 0; >>>>> + >>>>> + thread = kthread_run(printk_kthread_func, NULL, "printk"); >>>> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO >>>> task will block printk for ever. This seems bad. >>> I have to research this a bit but won't the SCHED_FIFO task that has >>> potentially unbounded amount of work lockup the CPU even though it does >>> occasional cond_resched()? >> We are facing complete hogs because of the printk thread being a SCHED_FIFO >> task and have this patch to fix it up for now. >> >> Author: Vladislav Levenetz >> Date: Wed Aug 10 13:58:00 2016 -0700 >> >> SW-7786: printk: Lower the priority of printk thread >> >> Flooding the console (with a test module) in a tight loop indefinitely >> makes android user interface very sluggish. Opening YouTube app and the >> device hangs and becomes even more unresponsive to the point it >> completely hangs. >> >> The asynchronous printk thread is a SCHED FIFO thread with priority >> MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO) >> instead, we observe much better performance using the same printk flood >> test. We don't even notice any kind of sluggishness during device usage. >> We can play a YouTube clip smoothly and use the device normally in >> general. The kernel log looks fine as well, as the flood of messages >> continue normally. >> >> Signed-off-by: Vladislav Levenetz >> Signed-off-by: Viresh Kumar >> --- >> kernel/printk/printk.c | 4 ---- >> 1 file changed, 4 deletions(-) >> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >> index c32872872cb6..ad5b30e5e6d9 100644 >> --- a/kernel/printk/printk.c >> +++ b/kernel/printk/printk.c >> @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data) >> static int __init_printk_kthread(void) >> { >> struct task_struct *thread; >> - struct sched_param param = { >> - .sched_priority = MAX_RT_PRIO - 1, >> - }; >> >> if (!printk_kthread_can_run || printk_sync || printk_kthread) >> return 0; >> @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void) >> return PTR_ERR(thread); >> } >> >> - sched_setscheduler(thread, SCHED_FIFO, ¶m); >> printk_kthread = thread; >> return 0; >> } > IMHO, this is fine. We force the synchronous mode in critical > situations anyway. > > But I was curious if we could hit a printk from the wake_up_process(). > The change above causes using the fair scheduler and there is > the following call chain [*] > > vprintk_emit() > -> wake_up_process() > -> try_to_wake_up() > -> ttwu_queue() > -> ttwu_do_activate() > -> ttwu_activate() > -> activate_task() > -> enqueue_task() > -> enqueue_task_fair() via p->sched_class->enqueue_task > -> cfs_rq_of() > -> task_of() > -> WARN_ON_ONCE(!entity_is_task(se)) > > We should never trigger this because printk_kthread is a task. > But what if the date gets inconsistent? > > Then there is the following chain: > > vprintk_emit() > -> wake_up_process() > -> try_to_wake_up() > -> ttwu_queue() > -> ttwu_do_activate() > -> ttwu_activate() > -> activate_task() > -> enqueue_task() > -> enqueue_task_fair() via p->sched_class->enqueue_task > ->hrtick_update() > -> hrtick_start_fair() > -> WARN_ON(task_rq(p) != rq) > > This looks like another paranoid consistency check that might be > triggered when the scheduler gets messed. > > I see few possible solutions: > > 1. Replace the WARN_ONs by printk_deferred(). > > This is the usual solution but it would make debugging less convenient. > > > 2. Force synchronous printk inside WARN()/BUG() macros. > > This would make sense even from other reasons. These are printed > when the system is in a strange state. There is no guarantee that > the printk_kthread will get scheduled. > > > 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > printk_func. > > It might be elegant. But we do not want this outside the scheduler > code. Therefore we would need special variants of WARN_*_SCHED() > BUG_*_SCHED() macros. > > > I personally prefer the 2nd solution. What do you think about it, > please? > > > Best Regards, > Petr Hi Petr, Sorry with for the late reply. Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if only a single CPU is running. We already had such a situation with system suspend. During a specific test on our device sometimes we hit a WARN from the time keeping core. (Cannot recall which one exactly. Viresh have it) from a printk wake_up path during system suspend and with already only one CPU running. So we were forced to make printing synchronous in the suspend path prior disabling all non-boot cpu's. Your solution number 2) sounds reasonable to me. I'm wondering if we could hit a WARN()/BUG() somewhere from the fair scheduler like the example you made for the RT sched? Thanks.