From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754847AbdDGIO6 (ORCPT ); Fri, 7 Apr 2017 04:14:58 -0400 Received: from atrey.karlin.mff.cuni.cz ([195.113.26.193]:54667 "EHLO atrey.karlin.mff.cuni.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752775AbdDGIOw (ORCPT ); Fri, 7 Apr 2017 04:14:52 -0400 Date: Fri, 7 Apr 2017 10:14:49 +0200 From: Pavel Machek To: Sergey Senozhatsky Cc: Jan Kara , "Eric W. Biederman" , Sergey Senozhatsky , Ye Xiaolong , Steven Rostedt , Petr Mladek , Andrew Morton , Linus Torvalds , Peter Zijlstra , "Rafael J . Wysocki" , Greg Kroah-Hartman , Jiri Slaby , Len Brown , linux-kernel@vger.kernel.org, lkp@01.org Subject: Re: [printk] fbc14616f4: BUG:kernel_reboot-without-warning_in_test_stage Message-ID: <20170407081449.GA12859@amd> References: <20170331023506.GB3493@jagdpanzerIV.localdomain> <20170331040438.GA366@jagdpanzerIV.localdomain> <20170331063913.GE20961@yexl-desktop> <20170331144730.GA10578@tigerII.localdomain> <87a881v52o.fsf@xmission.com> <20170403093152.GB15168@quack2.suse.cz> <20170406173306.GD10363@amd> <20170407044334.GA487@jagdpanzerIV.localdomain> <20170407071558.GA11792@amd> <20170407074634.GB1091@jagdpanzerIV.localdomain> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="SLDf9lqlvOQaIe6s" Content-Disposition: inline In-Reply-To: <20170407074634.GB1091@jagdpanzerIV.localdomain> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --SLDf9lqlvOQaIe6s Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri 2017-04-07 16:46:34, Sergey Senozhatsky wrote: > On (04/07/17 09:15), Pavel Machek wrote: > > On Fri 2017-04-07 13:44:40, Sergey Senozhatsky wrote: > > > Hello, > > >=20 > > > On (04/06/17 19:33), Pavel Machek wrote: > > > > > This patch set gives up part of the printk() reliability for boun= ded > > > > > latency (at least unless we detect we are really in trouble) whic= h is IMHO > > > > > a good trade-off for lots of users (and others can just turn this= feature > > > > > off). > > > >=20 > > > > If they can ever realize they were bitten by this feature. > > > >=20 > > > > Can we go for different tradeoff? > > > >=20 > > > > In console_unlock(), if you detect too much work, print "Too many > > > > messages to print, %d bytes delayed" and wake up kernel thread. > > >=20 > > > "too many messages" is undefined. console_unlock() can be called from > > > IRQ handler or with preemtion disabled, or under spin_lock, or under > > > RCU read lock, etc. etc. By the time we decide to wake up printk_kthr= ead > > > from console_unlock() it may be already too late. > >=20 > > So lets define "too many messages" as 240 characters. We know printk > > worked rather well for us for more than 20 years. Kernel code is used > > to printk taking few miliseconds. >=20 > serial console can be quite slow. and port->lock, that is acquired by > console_unlock()->call_console_drivers()->write(), is also accessible > by serial driver's IRQ handler, and this lock may be busy long > enough -- as long as that IRQ handler transmits/receives chars. but > that's not the point. Well. This is what we had for 20 years. > [..] > > Yeah? So you know modified printk() does not work, that's why > > "emergency mode" exists. Unfortunately, you can't rely on fact that > > you can detect half-crashed machines by printk levels. You usually > > can't. >=20 > I'm not happy with those printk_emergency_begin()/end(), sure. but that's > the reality -- every single solution that would offload printing duty imp= lies > that there will be cases when offloading would not be possible. either > PENDING_PRINTK_IPI to other CPUs, or irq_work(PENDING_OUTPUT) on a local = CPU, > or anything else (um... what it is?... softirq? tasklet? print one logbuf > entry from every IRQ handler? dunno, anything else?). There will be cases > when we won't be able to expect that something will take over and finish > printing for us. Well, may be I'm missing some other solution that would > offload printing, eliminating lockup conditions, and at the same time work > in 100% of the cases. I don't have magic solution in my sleeve. You made a good case that spending 30 seconds in printk() is a bad idea. I agree with that. Your solution is to introduce printk_emergency_begin()/end(). I don't agree there. I believe "spend at most 2 seconds in printk(), then print a warning and offload" is a solution closer to what we had before. Pavel --=20 (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blo= g.html --SLDf9lqlvOQaIe6s Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAljnSnkACgkQMOfwapXb+vJRaQCdFefJvdUQAOjYSVAohvzt7jH2 BvcAn2IYh42qvivYtJR+D97167Xdvtnl =TB3E -----END PGP SIGNATURE----- --SLDf9lqlvOQaIe6s-- From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1559859007473325058==" MIME-Version: 1.0 From: Pavel Machek To: lkp@lists.01.org Subject: Re: [printk] fbc14616f4: BUG:kernel_reboot-without-warning_in_test_stage Date: Fri, 07 Apr 2017 10:14:49 +0200 Message-ID: <20170407081449.GA12859@amd> In-Reply-To: <20170407074634.GB1091@jagdpanzerIV.localdomain> List-Id: --===============1559859007473325058== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Fri 2017-04-07 16:46:34, Sergey Senozhatsky wrote: > On (04/07/17 09:15), Pavel Machek wrote: > > On Fri 2017-04-07 13:44:40, Sergey Senozhatsky wrote: > > > Hello, > > > = > > > On (04/06/17 19:33), Pavel Machek wrote: > > > > > This patch set gives up part of the printk() reliability for boun= ded > > > > > latency (at least unless we detect we are really in trouble) whic= h is IMHO > > > > > a good trade-off for lots of users (and others can just turn this= feature > > > > > off). > > > > = > > > > If they can ever realize they were bitten by this feature. > > > > = > > > > Can we go for different tradeoff? > > > > = > > > > In console_unlock(), if you detect too much work, print "Too many > > > > messages to print, %d bytes delayed" and wake up kernel thread. > > > = > > > "too many messages" is undefined. console_unlock() can be called from > > > IRQ handler or with preemtion disabled, or under spin_lock, or under > > > RCU read lock, etc. etc. By the time we decide to wake up printk_kthr= ead > > > from console_unlock() it may be already too late. > > = > > So lets define "too many messages" as 240 characters. We know printk > > worked rather well for us for more than 20 years. Kernel code is used > > to printk taking few miliseconds. > = > serial console can be quite slow. and port->lock, that is acquired by > console_unlock()->call_console_drivers()->write(), is also accessible > by serial driver's IRQ handler, and this lock may be busy long > enough -- as long as that IRQ handler transmits/receives chars. but > that's not the point. Well. This is what we had for 20 years. > [..] > > Yeah? So you know modified printk() does not work, that's why > > "emergency mode" exists. Unfortunately, you can't rely on fact that > > you can detect half-crashed machines by printk levels. You usually > > can't. > = > I'm not happy with those printk_emergency_begin()/end(), sure. but that's > the reality -- every single solution that would offload printing duty imp= lies > that there will be cases when offloading would not be possible. either > PENDING_PRINTK_IPI to other CPUs, or irq_work(PENDING_OUTPUT) on a local = CPU, > or anything else (um... what it is?... softirq? tasklet? print one logbuf > entry from every IRQ handler? dunno, anything else?). There will be cases > when we won't be able to expect that something will take over and finish > printing for us. Well, may be I'm missing some other solution that would > offload printing, eliminating lockup conditions, and at the same time work > in 100% of the cases. I don't have magic solution in my sleeve. You made a good case that spending 30 seconds in printk() is a bad idea. I agree with that. Your solution is to introduce printk_emergency_begin()/end(). I don't agree there. I believe "spend at most 2 seconds in printk(), then print a warning and offload" is a solution closer to what we had before. Pavel -- = (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blo= g.html --===============1559859007473325058== Content-Type: application/pgp-signature MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="signature.asc" LS0tLS1CRUdJTiBQR1AgU0lHTkFUVVJFLS0tLS0KVmVyc2lvbjogR251UEcgdjEKCmlFWUVBUkVD QUFZRkFsam5TbmtBQ2drUU1PZndhcFhiK3ZKUmFRQ2RGZWZKdmRVUUFPallTVkFvaHZ6dDdqSDIK QnZjQW4ySVloNDJxdml2WXRKUitEOTcxNjdYZHZ0bmwKPVRCM0UKLS0tLS1FTkQgUEdQIFNJR05B VFVSRS0tLS0tCg== --===============1559859007473325058==--