From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751857AbcBJOgx (ORCPT ); Wed, 10 Feb 2016 09:36:53 -0500 Received: from mx2.suse.de ([195.135.220.15]:44939 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751019AbcBJOgw (ORCPT ); Wed, 10 Feb 2016 09:36:52 -0500 Date: Wed, 10 Feb 2016 15:36:49 +0100 From: Petr Mladek To: Denys Vlasenko Cc: linux-kernel@vger.kernel.org, srostedt@redhat.com, Steven Rostedt , Tejun Heo , Peter Hurley , Jan Kara , Sergey Senozhatsky , Andrew Morton , Kyle McMartin , KY Srinivasan , Dave Jones , Calvin Owens Message-ID: <20160210143649.GH3305@pathway.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Bcc: Subject: Re: [PATCH] printk: avoid livelock if another CPU printks continuously Reply-To: In-Reply-To: <1454963703-20433-1-git-send-email-dvlasenk@redhat.com> On Mon 2016-02-08 21:35:03, Denys Vlasenko wrote: > At the end of each printk(), kernel attempts to take console_sem. > If this succeeds, it feeds buffered message data to console devices > until there is nothing left, and releases console_sem: > > if (console_trylock_for_printk(this_cpu)) > console_unlock(); > > The livelock exists because code in console_unlock() has no > limit on the amount of buffered data it would process under > console_sem. This is bad if printk() was called with IRQs disabled. > > This patch makes console_unlock() release console_sem after 5 > iterations, which usually amounts to 5 lines of printk messages, > and give other printk'ing CPUs a chance to acquire console_sem. > > If some CPU grabs it, console_unlock() finishes. > If no one takes the semaphore, console_unlock() re-acquires it > and loops back for another cycle of console output. > > This seems to be a hard-to-trigger, but long-existing problem: Yup, and there are more people trying to handle this. I add some of them into CC. Sadly, the problem is much more complicated that it looks. Jan Kara (jack) has already provided many possible solutions that were not accepted. The last one can be seen at http://thread.gmane.org/gmane.linux.kernel/2105183/focus=2113787 See below some comments to your approach. > Signed-off-by: Denys Vlasenko > CC: linux-kernel@vger.kernel.org > CC: srostedt@redhat.com > CC: Steven Rostedt > CC: Tejun Heo > CC: Peter Hurley > --- > kernel/printk/printk.c | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index c963ba5..ca4f9d55 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2235,6 +2235,7 @@ void console_unlock(void) > unsigned long flags; > bool wake_klogd = false; > bool do_cond_resched, retry; > + unsigned cnt; > > if (console_suspended) { > up_console_sem(); > @@ -2257,6 +2258,7 @@ void console_unlock(void) > /* flush buffered message fragment immediately to console */ > console_cont_flush(text, sizeof(text)); > again: > + cnt = 5; > for (;;) { > struct printk_log *msg; > size_t ext_len = 0; > @@ -2284,6 +2286,9 @@ skip: > if (console_seq == log_next_seq) > break; > > + if (--cnt == 0) > + break; /* Someone else printk's like crazy */ > + > msg = log_from_idx(console_idx); > if (msg->flags & LOG_NOCONS) { > /* > @@ -2350,6 +2355,26 @@ skip: > if (retry && console_trylock()) > goto again; > > + if (cnt == 0) { > + /* > + * Other CPU(s) printk like crazy, filling log_buf[]. > + * Try to get rid of the "honor" of servicing their data: > + * give _them_ time to grab console_sem and start working. > + */ > + cnt = 9999; > + while (--cnt != 0) { > + cpu_relax(); > + if (console_seq == log_next_seq) { This condition is true when all available messages are printed to the console. It means that there is nothing to do at all. It is quite late. A much better solution would be to store console_seq to a local variable and check it is being modified by an other CPU. > + /* Good, other CPU entered "for(;;)" loop */ > + goto out; > + } > + } > + /* No one seems to be willing to take it... */ > + if (console_trylock()) > + goto again; /* we took it */ > + /* Nope, someone else holds console_sem! Good */ The cycle gives a big chance other CPUs to enter console_unlock(). It means that more CPUs might end up in the above busy cycle. It gives a chance to move the printing to another CPU. It likely slows down the flood of messages because the producer end up here as well. So, it probably works but the performance is far from optimal. Many CPUs might end up doing nothing. I am afraid that this is not the right way to go. Best Regards, Petr