From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754361AbdDGHqm (ORCPT ); Fri, 7 Apr 2017 03:46:42 -0400 Received: from mail-pg0-f67.google.com ([74.125.83.67]:33329 "EHLO mail-pg0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752245AbdDGHqe (ORCPT ); Fri, 7 Apr 2017 03:46:34 -0400 Date: Fri, 7 Apr 2017 16:46:34 +0900 From: Sergey Senozhatsky To: Pavel Machek Cc: Sergey Senozhatsky , Jan Kara , "Eric W. Biederman" , Sergey Senozhatsky , Ye Xiaolong , Steven Rostedt , Petr Mladek , Andrew Morton , Linus Torvalds , Peter Zijlstra , "Rafael J . Wysocki" , Greg Kroah-Hartman , Jiri Slaby , Len Brown , linux-kernel@vger.kernel.org, lkp@01.org Subject: Re: [printk] fbc14616f4: BUG:kernel_reboot-without-warning_in_test_stage Message-ID: <20170407074634.GB1091@jagdpanzerIV.localdomain> References: <20170330213829.GA21476@inn.lkp.intel.com> <20170331023506.GB3493@jagdpanzerIV.localdomain> <20170331040438.GA366@jagdpanzerIV.localdomain> <20170331063913.GE20961@yexl-desktop> <20170331144730.GA10578@tigerII.localdomain> <87a881v52o.fsf@xmission.com> <20170403093152.GB15168@quack2.suse.cz> <20170406173306.GD10363@amd> <20170407044334.GA487@jagdpanzerIV.localdomain> <20170407071558.GA11792@amd> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170407071558.GA11792@amd> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (04/07/17 09:15), Pavel Machek wrote: > On Fri 2017-04-07 13:44:40, Sergey Senozhatsky wrote: > > Hello, > > > > On (04/06/17 19:33), Pavel Machek wrote: > > > > This patch set gives up part of the printk() reliability for bounded > > > > latency (at least unless we detect we are really in trouble) which is IMHO > > > > a good trade-off for lots of users (and others can just turn this feature > > > > off). > > > > > > If they can ever realize they were bitten by this feature. > > > > > > Can we go for different tradeoff? > > > > > > In console_unlock(), if you detect too much work, print "Too many > > > messages to print, %d bytes delayed" and wake up kernel thread. > > > > "too many messages" is undefined. console_unlock() can be called from > > IRQ handler or with preemtion disabled, or under spin_lock, or under > > RCU read lock, etc. etc. By the time we decide to wake up printk_kthread > > from console_unlock() it may be already too late. > > So lets define "too many messages" as 240 characters. We know printk > worked rather well for us for more than 20 years. Kernel code is used > to printk taking few miliseconds. serial console can be quite slow. and port->lock, that is acquired by console_unlock()->call_console_drivers()->write(), is also accessible by serial driver's IRQ handler, and this lock may be busy long enough -- as long as that IRQ handler transmits/receives chars. but that's not the point. [..] > Yeah? So you know modified printk() does not work, that's why > "emergency mode" exists. Unfortunately, you can't rely on fact that > you can detect half-crashed machines by printk levels. You usually > can't. I'm not happy with those printk_emergency_begin()/end(), sure. but that's the reality -- every single solution that would offload printing duty implies that there will be cases when offloading would not be possible. either PENDING_PRINTK_IPI to other CPUs, or irq_work(PENDING_OUTPUT) on a local CPU, or anything else (um... what it is?... softirq? tasklet? print one logbuf entry from every IRQ handler? dunno, anything else?). There will be cases when we won't be able to expect that something will take over and finish printing for us. Well, may be I'm missing some other solution that would offload printing, eliminating lockup conditions, and at the same time work in 100% of the cases. -ss From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============7015995405302384784==" MIME-Version: 1.0 From: Sergey Senozhatsky To: lkp@lists.01.org Subject: Re: [printk] fbc14616f4: BUG:kernel_reboot-without-warning_in_test_stage Date: Fri, 07 Apr 2017 16:46:34 +0900 Message-ID: <20170407074634.GB1091@jagdpanzerIV.localdomain> In-Reply-To: <20170407071558.GA11792@amd> List-Id: --===============7015995405302384784== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On (04/07/17 09:15), Pavel Machek wrote: > On Fri 2017-04-07 13:44:40, Sergey Senozhatsky wrote: > > Hello, > > = > > On (04/06/17 19:33), Pavel Machek wrote: > > > > This patch set gives up part of the printk() reliability for bounded > > > > latency (at least unless we detect we are really in trouble) which = is IMHO > > > > a good trade-off for lots of users (and others can just turn this f= eature > > > > off). > > > = > > > If they can ever realize they were bitten by this feature. > > > = > > > Can we go for different tradeoff? > > > = > > > In console_unlock(), if you detect too much work, print "Too many > > > messages to print, %d bytes delayed" and wake up kernel thread. > > = > > "too many messages" is undefined. console_unlock() can be called from > > IRQ handler or with preemtion disabled, or under spin_lock, or under > > RCU read lock, etc. etc. By the time we decide to wake up printk_kthread > > from console_unlock() it may be already too late. > = > So lets define "too many messages" as 240 characters. We know printk > worked rather well for us for more than 20 years. Kernel code is used > to printk taking few miliseconds. serial console can be quite slow. and port->lock, that is acquired by console_unlock()->call_console_drivers()->write(), is also accessible by serial driver's IRQ handler, and this lock may be busy long enough -- as long as that IRQ handler transmits/receives chars. but that's not the point. [..] > Yeah? So you know modified printk() does not work, that's why > "emergency mode" exists. Unfortunately, you can't rely on fact that > you can detect half-crashed machines by printk levels. You usually > can't. I'm not happy with those printk_emergency_begin()/end(), sure. but that's the reality -- every single solution that would offload printing duty impli= es that there will be cases when offloading would not be possible. either PENDING_PRINTK_IPI to other CPUs, or irq_work(PENDING_OUTPUT) on a local CP= U, or anything else (um... what it is?... softirq? tasklet? print one logbuf entry from every IRQ handler? dunno, anything else?). There will be cases when we won't be able to expect that something will take over and finish printing for us. Well, may be I'm missing some other solution that would offload printing, eliminating lockup conditions, and at the same time work in 100% of the cases. -ss --===============7015995405302384784==--