From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753204AbcHPJEi (ORCPT ); Tue, 16 Aug 2016 05:04:38 -0400 Received: from mx2.suse.de ([195.135.220.15]:38021 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751123AbcHPJEg (ORCPT ); Tue, 16 Aug 2016 05:04:36 -0400 Date: Tue, 16 Aug 2016 11:04:30 +0200 From: Petr Mladek To: Vladislav Levenetz Cc: Viresh Kumar , Jan Kara , Andrew Morton , Sergey Senozhatsky , Jan Kara , Tejun Heo , Tetsuo Handa , "linux-kernel@vger.kernel.org" , Byungchul Park , Sergey Senozhatsky , Greg Kroah-Hartman Subject: Re: [PATCH v10 1/2] printk: Make printk() completely async Message-ID: <20160816090430.GK13300@pathway.suse.cz> References: <1459789048-1337-1-git-send-email-sergey.senozhatsky@gmail.com> <1459789048-1337-2-git-send-email-sergey.senozhatsky@gmail.com> <20160404155149.a3e3307def2d1315e2099c63@linux-foundation.org> <20160406082758.GA3554@quack.suse.cz> <20160812094447.GD7339@pathway.suse.cz> <57B1D12A.6030106@mm-sol.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <57B1D12A.6030106@mm-sol.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 2016-08-15 17:26:50, Vladislav Levenetz wrote: > On 08/12/2016 12:44 PM, Petr Mladek wrote: > >But I was curious if we could hit a printk from the wake_up_process(). > >The change above causes using the fair scheduler and there is > >the following call chain [*] > > > > vprintk_emit() > > -> wake_up_process() > > -> try_to_wake_up() > > -> ttwu_queue() > > -> ttwu_do_activate() > > -> ttwu_activate() > > -> activate_task() > > -> enqueue_task() > > -> enqueue_task_fair() via p->sched_class->enqueue_task > > -> cfs_rq_of() > > -> task_of() > > -> WARN_ON_ONCE(!entity_is_task(se)) > > > >We should never trigger this because printk_kthread is a task. > >But what if the date gets inconsistent? > > > >Then there is the following chain: > > > > vprintk_emit() > > -> wake_up_process() > > -> try_to_wake_up() > > -> ttwu_queue() > > -> ttwu_do_activate() > > -> ttwu_activate() > > -> activate_task() > > -> enqueue_task() > > -> enqueue_task_fair() via p->sched_class->enqueue_task > > ->hrtick_update() > > -> hrtick_start_fair() > > -> WARN_ON(task_rq(p) != rq) > > > >This looks like another paranoid consistency check that might be > >triggered when the scheduler gets messed. > > > >I see few possible solutions: > > > >1. Replace the WARN_ONs by printk_deferred(). > > > > This is the usual solution but it would make debugging less convenient. > > > > > >2. Force synchronous printk inside WARN()/BUG() macros. > > > > This would make sense even from other reasons. These are printed > > when the system is in a strange state. There is no guarantee that > > the printk_kthread will get scheduled. > > > > > >3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU > > printk_func. > > > > It might be elegant. But we do not want this outside the scheduler > > code. Therefore we would need special variants of WARN_*_SCHED() > > BUG_*_SCHED() macros. > > > > > >I personally prefer the 2nd solution. What do you think about it, > >please? > > > > > >Best Regards, > >Petr > > Hi Petr, > > Sorry with for the late reply. No problem. > Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if > only a single CPU is running. I think that the deadlock might happen also with more CPUs if the async_printk() is enabled. I mean: printk_emit() wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(&p->pi_lock, flags) !!!! ttwu_queue() ttwu_do_activate() ttwu_activate() activate_task() enqueue_task() enqueue_task_fair() via p->sched_class->enqueue_task hrtick_update() hrtick_start_fair() WARN_ON(task_rq(p) != rq) printk() vprintk_emit() wake_up_process() try_to_wake_up() raw_spin_lock_irqsave(&p->pi_lock, flags) There is a deadlock because p->pi_lock is already taken by the first try_to_wake_up(). By other words, I think that the single running CPU was only a symptom but it was not the root cause of the deadlock. > We already had such a situation with system suspend. During a > specific test on our device sometimes we hit a WARN from the time > keeping core. (Cannot recall which one exactly. Viresh have it) from > a printk wake_up path during system suspend and with already only > one CPU running. > So we were forced to make printing synchronous in the suspend path > prior disabling all non-boot cpu's. > > Your solution number 2) sounds reasonable to me. Good. > I'm wondering if we could hit a WARN()/BUG() somewhere from the fair > scheduler like the example you made for the RT sched? Unfortunately, it looks like. The example above actually is from the fair scheduler. Best Regards, Petr