From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751981AbcGRLBm (ORCPT ); Mon, 18 Jul 2016 07:01:42 -0400 Received: from mx2.suse.de ([195.135.220.15]:40428 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751855AbcGRLBk (ORCPT ); Mon, 18 Jul 2016 07:01:40 -0400 Date: Mon, 18 Jul 2016 13:01:34 +0200 From: Jan Kara To: Viresh Kumar Cc: Jan Kara , Sergey Senozhatsky , Sergey Senozhatsky , rjw@rjwysocki.net, Tejun Heo , Greg Kroah-Hartman , Linux Kernel Mailing List , vlevenetz@mm-sol.com, vaibhav.hiremath@linaro.org, alex.elder@linaro.org, johan@kernel.org, akpm@linux-foundation.org, rostedt@goodmis.org, linux-pm@vger.kernel.org, Petr Mladek , Thomas Gleixner Subject: Re: [Query] Preemption (hogging) of the work handler Message-ID: <20160718110134.GB6782@quack2.suse.cz> References: <20160701165959.GR12473@ubuntu> <20160701172232.GD28719@htj.duckdns.org> <20160706182842.GS2671@ubuntu> <20160711102603.GI12410@quack2.suse.cz> <20160711154438.GA528@swordfish> <20160711223501.GI4695@ubuntu> <20160712231903.GR4695@ubuntu> <20160713054507.GA563@swordfish> <20160714141216.GC13151@quack2.suse.cz> <20160714221251.GE3057@ubuntu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160714221251.GE3057@ubuntu> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 14-07-16 15:12:51, Viresh Kumar wrote: > On 14-07-16, 16:12, Jan Kara wrote: > > Exactly. Calling printk() from certain parts of the kernel (like scheduler > > code or timer code) has been always unsafe because printk itself uses these > > parts and so it can lead to deadlocks. That's why printk_deffered() has > > been introduced as you mention below. > > > > And with sync printk the above deadlock doesn't trigger only by chance - if > > there happened to be a waiter on console_sem while we suspend, the same > > deadlock would trigger because up(&console_sem) will try to wake him up and > > the warning in timekeeping code will cause recursive printk. > > > > So I think your patch doesn't really address the real issue - it only > > works around the particular WARN_ON(timekeeping_enabled) warning but if > > there was a different warning in timekeeping code which would trigger, it > > has a potential for causing recursive printk deadlock (and indeed we had > > such issues previously - see e.g. 504d58745c9c "timer: Fix lock inversion > > between hrtimer_bases.lock and scheduler locks"). > > > > So there are IMHO two issues here worth looking at: > > > > 1) I didn't find how a wakeup would would lead to calling to ktime_get() in > > the current upstream kernel or even current RT kernel. Maybe this is a > > problem specific to the 3.10 kernel you are using? If yes, we don't have to > > do anything for current upstream AFAIU. > > I haven't checked that earlier, but I see the path in both 3.10 and mainline. > > vprintk_emit > -> wake_up_process > -> try_to_wake_up > -> ttwu_queue > -> ttwu_do_activate > -> ttwu_activate > -> activate_task > -> enqueue_task (sched/core.c) > -> enqueue_task_rt (rt.c) > -> enqueue_rt_entity > -> __enqueue_rt_entity > -> inc_rt_tasks > -> inc_rt_group > -> start_rt_bandwidth > -> start_bandwidth_timer > -> __hrtimer_start_range_ns > -> ktime_get() Yeah, you are right. > > If I just missed how wakeup can call into ktime_get() in current upstream, > > there is another question: > > > > 2) Is it OK that printk calls wakeup so late during suspend? > > To clarify again to everybody, we are talking about the place where all > non-boot CPUs are already hot-unplugged and the last running one has > disabled interrupts. > > I believe that we can't do migration at all now, right? What will we get by > calling wake_up_process() now anyway ? As I already wrote to Rafael, wake_up_process() will change the process state to TASK_RUNNING so that it can run after we resume from suspend. But seeing that the same problem is in upstream I guess what Sergey did makes more sense if it works for you. If Sergey's fix does not work for you due to too many messages being printed during device suspend, then we will have to try something else... Honza -- Jan Kara SUSE Labs, CR