From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752253AbdGDODV (ORCPT ); Tue, 4 Jul 2017 10:03:21 -0400 Received: from mx2.suse.de ([195.135.220.15]:41970 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750872AbdGDODU (ORCPT ); Tue, 4 Jul 2017 10:03:20 -0400 Date: Tue, 4 Jul 2017 16:03:16 +0200 From: Petr Mladek To: Sergey Senozhatsky Cc: Sergey Senozhatsky , Steven Rostedt , Jan Kara , Andrew Morton , Peter Zijlstra , "Rafael J . Wysocki" , Eric Biederman , Greg Kroah-Hartman , Jiri Slaby , Pavel Machek , Andreas Mohr , Tetsuo Handa , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCHv3 2/5] printk: introduce printing kernel thread Message-ID: <20170704140316.GK23069@pathway.suse.cz> References: <20170509082859.854-1-sergey.senozhatsky@gmail.com> <20170509082859.854-3-sergey.senozhatsky@gmail.com> <20170510055935.GA1966@jagdpanzerIV.localdomain> <20170529092906.GD21894@pathway.suse.cz> <20170531072233.GC7672@jagdpanzerIV.localdomain> <20170628121925.GN1538@pathway.suse.cz> <20170629073321.GA475@jagdpanzerIV.localdomain> <20170630115457.GE23069@pathway.suse.cz> <20170630124224.GA792@jagdpanzerIV.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170630124224.GA792@jagdpanzerIV.localdomain> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 2017-06-30 21:42:24, Sergey Senozhatsky wrote: > Hello, > > On (06/30/17 13:54), Petr Mladek wrote: > > > but..... > > > the opposite possibility is that messages either won't be printed > > > soon (until next printk or console_unlock()) or won't be printed > > > ever at all (in case of sudden system death). I don't think it's > > > a good alternative. > > > > I see it like a weighing machine. There is a "guaranteed" output on > > one side and a softlockups prevention on the other side. The more > > we prevent the softlockups the less we guarantee the output. > > I apply a very simple litmus test. if the answer to the question > "so we leave console_unlock() and there are pending messages, > who and when is going to flush the remaining messages?" is > "something sometime in the future" then it's a no-no. > > "something sometime in the future" is equal to "no one". I am afraid that it is hard or even impossible to have offloading and be sure that the other side continues. IMHO, there always will be a possibility that something could go wrong. I think that resisting on keeping the current guarantees might force use to keep the current code (status quo). Please, let me better explain the problems that I see with the current code. > > My main unresolved doubts about this patchset are: > > > > 1. It gives other processes only very small change to take > > over the job. They either need to call console_trylock() > > in very small "race" window or they need to call > > console_lock(). Where console_lock() only signalizes > > that the caller is willing to take the job and puts > > him into sleep. > > printk_kthread does console_lock(). But this is not enough. In fact, it makes things worse. Let me explain: We are talking about flood of messages when we go over atomic_print_limit and activate offloading. Let's look at it in more details: CPU0 CPU1 console_unlock() // handle atomic_print_limit lines // enable offloading wake_up_process(printk_kthread) // handly many other lines // printk_kthread finally gets CPU // waken in printk_kthread_func() console_lock() down() // lock most likely taken by CPU0 list_add_tail(&waiter.list) schedule() // handle one more line up_console_sem() list_del(&waiter.list) wake_up_process(waiter->task) console_trylock() // fails because the lock is // taken by the waiter Regression 1: Nobody flushes the messages to the console until the printk_kthread() is scheduled again, really takes the console_lock(), and calls console_unlock(). The original code flushed the messages all the time until all were flushed. (Except if it was allowed to sleep in console_unlock). Regression 2: printk_kthread() takes console_lock() in preemtible context => is allowed to sleep and nobody is able to get console_lock in the meantime => slowdown and risk of lost messages. In addition, the printk kthread need not be scheduled at all when: + It is assigned to the same CPU and the current console_lock() owner. + It is assigned to a CPU where some higher priority task got mad. Finally, it might be scheduled too late when we already handled too many lines over the atomic limit and switched to the emergency mode. How big should be atomic limit to give the kthread chance to get scheduled in time? It might be pretty complicated mathematics. It will depend on the hardware, system load, used scheduler, name spaces limitations, ... > we may (and need to) improve the retry path in console_unlock(). > but we must not leave it until other process locks the console_sem. Several possibilities came to my mind but they all have problems. Let me explain. 1. Make printk_kthread() special and avoid sleeping with console_lock() taken => force offload when need_resched gets true. But this still keeps regression 1. Also it means that we are offloading without guarantees. But the same is true for sleeping with console_lock. 2. Increase the chance that other printk() callers will take over console_lock(): + Add delay before console_trylock() in console_unlock(). But this is very ugly. Waste of resources => no way + console_unlock() might signalize offloading and printk() caller might call console_lock() instead of console_trylock(). It must be only one printk() caller (first one?). It does not make sense to block all CPUs and the entire system. Anyway, console_lock() might sleep and printk() does not know the context => no way The only chance is that we combine this with throttling tasks that do too many printk() calls. But it is still ugly and hard to predict. + Replace console_lock() with spin lock. But this would require complete rework/clean up => task for years Did I miss a nice trick? > > Another detailed description of this problem can be found > > in my previous mail, see > > https://lkml.kernel.org/r/20170628121925.GN1538@pathway.suse.cz > > > > > > 2. It adds rather complex dependency on the scheduler. I know > > that my simplified solution do this as well but another way[*] > > Let me explain. I would split the dependency on the code > > and behavior relation. > > > > From the code side: The current printk() calls wake_up_process() > > and we need to use printk_deferred() in the related scheduler code. > > This patch does this as well, so there is no win and no lose. > > Well, you talk about changing the affinity and other tricks > > in the other mails. This might add more locations where > > printk_deferred() would be needed. > > we are in printk_safe all the way through console_offload_printing(), > the context is derived from console_unlock(). > > why we would need printk_deferred()? I think that the reason is the same why we still need printk_deferred() now. printk_safe() helps only when the recursion starts inside printk(). It fails when printk() is called from a function that is used inside printk(). > > From the behavior side: The current code wakes the process > > and is done. The code in this patch wakes the process and > > waits until it[**] gets CPU and really runs. It switches to > > the emergency mode when the other process does not run in time. > > By other words, this patch depends on more actions done > > by the scheduler and changes behavior based on it. IMHO, > > this will be very hard to code, tune, and debug. > > A proper solution might require more code dependency. > > > > [*] My solution depends on the scheduler in the sense > > that messages will get lost when nobody else will take > > over the console job. > > which is precisely and exactly the thing that we should never > let to happen. there is no _win_, because we _lost_ the messages. I do not see a way how to avoid this at the moment. Therefore I suggest changes that might reduce the possible damage as much as possible: + reduce sleeping with console_lock() => this will actually increase the change to flush the messages in panic() + use high enough atomic_print_limit so that we prevent blocking in atomic context for seconds but that we trigger offloading only in very rare situations Maybe we could use a number in secons for the limit instead of number of lines. Note that I still consider flood of messages as a broken system. If anyone gets into the situation and lose messages, he should fix/reduce the flood first. The atomic limits should flush enought messages to do so + allow to throttle printk() caller that cause the flood; this looks like a win-win solution; it handles the root of the problem + eventually write on the console that offloading has happended, so that we could see when it failed + add the early printk stuff from Peter Zijstra so that developers have another chance to debug these problems This is needed already now when printk() fails. Does this really sound that bad? Of course, there might be situations where printk() might behave worse. But we still should allow to disable offloading completely for this cases. Well, I admit that it is hard to be sure. printk() is used in too many situations. But we either want to do something or we could just keep the status quo. > > > 3. The prevention of soft-lockups is questionable. If you are in > > soft-lockup prone situation, the only prevention is to do an > > offload. But if you switch to the emergency mode and give > > up offloading too early, the new code stops preventing > > the softlockup. > > > > Of course, the patchset does not make it worse but the question > > is how much it really helps. It would be bad to add a lot of > > code/complexity with almost no gain. > > > > > > IMHO, if we try to solve the 1st problem (chance of offloading), > > it might add a risk of deadlocks and/or make the 2nd problem > > (dependency on scheduler) worse. Also I am afraid that we would > > repeat many dead ways already tried by Jan Kara. > > what deadlock? The real risk semms to be a missing printk_deferred() needed in a newly used code. I had also some wrongly done hand-shake of console_lock that cause several deadlocks. For example, we might wait for someone in console_unlock() who will never be scheduled. Or we will start waiting in printk() for console_lock() but there might be already another waiters in the list. There might be some dependecy between them. OK, this is rather theoreticall. Anyway, using more code in printk() might cause recursion and deadlocks. So, we need to be careful. Uff, all this is complicated. I wish I managed to explain this more easily. Best Regards, Petr PS: I am going to be off until the next Monday. It might take some time until I process all the other mails. I try hard to sort the ideas and keep some clear view.