linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* oops pauser.
@ 2006-01-05  4:52 Dave Jones
  2006-01-05  6:10 ` oops pauser. / boot_delayer Randy.Dunlap
                   ` (5 more replies)
  0 siblings, 6 replies; 75+ messages in thread
From: Dave Jones @ 2006-01-05  4:52 UTC (permalink / raw)
  To: linux-kernel

In my quest to get better debug data from users in Fedora bug reports,
I came up with this patch.  A majority of users don't have serial
consoles, so when an oops scrolls off the top of the screen,
and locks up, they usually end up reporting a 2nd (or later) oops
that isn't particularly helpful (or worse, some inconsequential
info like 'sleeping whilst atomic' warnings)

With this patch, if we oops, there's a pause for a two minutes..
which hopefully gives people enough time to grab a digital camera
to take a screenshot of the oops.

It has an on-screen timer so the user knows what's going on,
(and that it's going to come back to life [maybe] after the oops).

The one case this doesn't catch is the problem of oopses whilst
in X. Previously a non-fatal oops would stall X momentarily,
and then things continue. Now those cases will lock up completely
for two minutes. Future patches could add some additional feedback
during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.

Signed-off-by: Dave Jones <davej@redhat.com>

--- vanilla/arch/i386/kernel/traps.c	2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/arch/i386/kernel/traps.c	2006-01-04 23:42:46.000000000 -0500
@@ -256,6 +271,15 @@ void show_registers(struct pt_regs *regs
 		}
 	}
 	printk("\n");
+	{
+		int i;
+		for (i=120;i>0;i--) {
+			mdelay(1000);
+			touch_nmi_watchdog();
+			printk("Continuing in %d seconds. \r", i);
+		}
+		printk("\n");
+	}
 }	
 
 static void handle_BUG(struct pt_regs *regs)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05  4:52 oops pauser Dave Jones
@ 2006-01-05  6:10 ` Randy.Dunlap
  2006-01-05  7:30   ` Bernd Eckenfels
  2006-01-05  8:15 ` oops pauser Jan Engelhardt
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 75+ messages in thread
From: Randy.Dunlap @ 2006-01-05  6:10 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

On Wed, 4 Jan 2006 23:52:12 -0500 Dave Jones wrote:

> In my quest to get better debug data from users in Fedora bug reports,
> I came up with this patch.  A majority of users don't have serial
> consoles, so when an oops scrolls off the top of the screen,
> and locks up, they usually end up reporting a 2nd (or later) oops
> that isn't particularly helpful (or worse, some inconsequential
> info like 'sleeping whilst atomic' warnings)
> 
> With this patch, if we oops, there's a pause for a two minutes..
> which hopefully gives people enough time to grab a digital camera
> to take a screenshot of the oops.
> 
> It has an on-screen timer so the user knows what's going on,
> (and that it's going to come back to life [maybe] after the oops).
> 
> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes. Future patches could add some additional feedback
> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.

That's nice.  Here's another patch^w hack.

This one delays each printk() during boot by a variable time
(from kernel command line), while system_state == SYSTEM_BOOTING.
Caveat:  it's not terribly SMP safe or SMP nice.
Any ideas for improvements (esp. in the SMP area) are appreciated.

---

From: Randy Dunlap <rdunlap@xenotime.net>

Optionally add a boot delay after each kernel printk() call,
crudely measured in milliseconds, with a maximum delay of
10 seconds per printk.

Enable CONFIG_BOOT_DELAY=y and then add (e.g.):
"lpj=loops_per_jiffy boot_delay=100"
to the kernel command line.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
---

 init/calibrate.c  |    2 +-
 init/main.c       |   25 +++++++++++++++++++++++++
 kernel/printk.c   |   33 +++++++++++++++++++++++++++++++++
 lib/Kconfig.debug |   18 ++++++++++++++++++
 4 files changed, 77 insertions(+), 1 deletion(-)

--- linux-2615-work.orig/init/main.c
+++ linux-2615-work/init/main.c
@@ -557,6 +557,31 @@ static int __init initcall_debug_setup(c
 }
 __setup("initcall_debug", initcall_debug_setup);
 
+#ifdef CONFIG_BOOT_DELAY
+
+unsigned int boot_delay = 0; /* msecs delay after each printk during bootup */
+extern long preset_lpj;
+unsigned long long printk_delay_msec = 0; /* per msec, based on boot_delay */
+
+static int __init boot_delay_setup(char *str)
+{
+	unsigned long lpj = preset_lpj ? preset_lpj : 1000000; /* some guess */
+	unsigned long long loops_per_msec = lpj / 1000 * CONFIG_HZ;
+
+	get_option(&str, &boot_delay);
+	if (boot_delay > 10 * 1000)
+		boot_delay = 0;
+
+	printk_delay_msec = loops_per_msec;
+	printk("boot_delay: %u, preset_lpj: %ld, lpj: %lu, CONFIG_HZ: %d, printk_delay_msec: %llu\n",
+		boot_delay, preset_lpj, lpj, CONFIG_HZ, printk_delay_msec);
+
+	return 1;
+}
+__setup("boot_delay=", boot_delay_setup);
+
+#endif
+
 struct task_struct *child_reaper = &init_task;
 
 extern initcall_t __initcall_start[], __initcall_end[];
--- linux-2615-work.orig/init/calibrate.c
+++ linux-2615-work/init/calibrate.c
@@ -10,7 +10,7 @@
 
 #include <asm/timex.h>
 
-static unsigned long preset_lpj;
+unsigned long preset_lpj;
 static int __init lpj_setup(char *str)
 {
 	preset_lpj = simple_strtoul(str,NULL,0);
--- linux-2615-work.orig/kernel/printk.c
+++ linux-2615-work/kernel/printk.c
@@ -23,6 +23,7 @@
 #include <linux/smp_lock.h>
 #include <linux/console.h>
 #include <linux/init.h>
+#include <linux/jiffies.h>
 #include <linux/module.h>
 #include <linux/interrupt.h>			/* For in_interrupt() */
 #include <linux/config.h>
@@ -201,6 +202,33 @@ out:
 
 __setup("log_buf_len=", log_buf_len_setup);
 
+#ifdef CONFIG_BOOT_DELAY
+
+extern unsigned int boot_delay; /* msecs to delay after each printk during bootup */
+extern long preset_lpj;
+extern unsigned long long printk_delay_msec;
+
+static void boot_delay_msec(int millisecs)
+{
+	unsigned long long k = printk_delay_msec * millisecs;
+	unsigned long timeout;
+
+	timeout = jiffies + msecs_to_jiffies(millisecs);
+	while (k) {
+		k--;
+		rep_nop();
+		/*
+		 * use (volatile) jiffies to prevent
+		 * compiler reduction; loop termination via jiffies
+		 * is secondary and may or may not happen.
+		 */
+		if (time_after(jiffies, timeout))
+			break;
+	}
+}
+
+#endif
+
 /*
  * Commands to do_syslog:
  *
@@ -520,6 +548,11 @@ asmlinkage int printk(const char *fmt, .
 	r = vprintk(fmt, args);
 	va_end(args);
 
+#ifdef CONFIG_BOOT_DELAY
+	if (boot_delay && system_state == SYSTEM_BOOTING)
+		boot_delay_msec(boot_delay);
+#endif
+
 	return r;
 }
 
--- linux-2615-work.orig/lib/Kconfig.debug
+++ linux-2615-work/lib/Kconfig.debug
@@ -186,6 +186,24 @@ config FRAME_POINTER
 	  some architectures or if you use external debuggers.
 	  If you don't debug the kernel, you can say N.
 
+config BOOT_DELAY
+	bool "Delay each boot message by N milliseconds"
+	depends on DEBUG_KERNEL
+	help
+	  This build option allows you to read kernel boot messages
+	  by inserting a short delay after each one.  The delay is
+	  specified in milliseconds on the kernel command line,
+	  using "boot_delay=N".
+
+	  It is likely that you would also need to use "lpj=M" to preset
+	  the "loops per jiffie" value.
+	  See a previous boot log for the "lpj" value to use for your
+	  system, and then set "lpj=M" before setting "boot_delay=N".
+	  NOTE:  Using this option may adversely affect SMP systems.
+	  I.e., processors other than the first one may not boot up.
+	  BOOT_DELAY also may cause DETECT_SOFTLOCKUP to detect
+	  what it believes to be lockup conditions.
+
 config RCU_TORTURE_TEST
 	tristate "torture tests for RCU"
 	depends on DEBUG_KERNEL


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05  6:10 ` oops pauser. / boot_delayer Randy.Dunlap
@ 2006-01-05  7:30   ` Bernd Eckenfels
  2006-01-05  8:07     ` Jan Engelhardt
                       ` (2 more replies)
  0 siblings, 3 replies; 75+ messages in thread
From: Bernd Eckenfels @ 2006-01-05  7:30 UTC (permalink / raw)
  To: linux-kernel

Randy.Dunlap <rdunlap@xenotime.net> wrote:
> This one delays each printk() during boot by a variable time
> (from kernel command line), while system_state == SYSTEM_BOOTING.

This sounds a bit like a aprils fool joke, what it is meant to do? You can
read the messages in the bootlog and use the scrollback keys, no?

Gruss
Bernd

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05  7:30   ` Bernd Eckenfels
@ 2006-01-05  8:07     ` Jan Engelhardt
  2006-01-06  1:28       ` David Lang
  2006-01-05  9:25     ` Grant Coady
  2006-01-05 11:11     ` Dave Jones
  2 siblings, 1 reply; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-05  8:07 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel


>> This one delays each printk() during boot by a variable time
>> (from kernel command line), while system_state == SYSTEM_BOOTING.
>
>This sounds a bit like a aprils fool joke, what it is meant to do? You can
>read the messages in the bootlog and use the scrollback keys, no?
>
If the end result is a PANIC, then no, then scrollback keys do not work. 
Also note that the kernel generates a lot of noise^W text - if now the 
start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach 
the top of the kernel when it says
  Linux version 2.6.15 (jengelh@gwdg-wb04.gwdg.de) (gcc version 4.0.2 
  20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006

Plus, if you happen to oops away, panic away or just get a "VFS root
unmountable" during kernel _boot_, you cannot use scrollback either.

So to say, scrollback starts working (for me) when INIT is spawned.



Jan Engelhardt
-- 
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05  4:52 oops pauser Dave Jones
  2006-01-05  6:10 ` oops pauser. / boot_delayer Randy.Dunlap
@ 2006-01-05  8:15 ` Jan Engelhardt
  2006-01-05 10:33   ` Dave Jones
  2006-01-05 13:37 ` Alan Cox
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-05  8:15 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

>In my quest to get better debug data from users in Fedora bug reports,
>I came up with this patch.  A majority of users don't have serial
>consoles, so when an oops scrolls off the top of the screen,
>and locks up, they usually end up reporting a 2nd (or later) oops
>that isn't particularly helpful (or worse, some inconsequential
>info like 'sleeping whilst atomic' warnings)

Here's something interesting too:
Sometimes, an oops is even longer than 25 rows, and the usual user
does not have
 - VGA mode with a lot of lines (because it's hard to read)
 - FB mode with a lot of lines (slow, and it's also hard to read)

Is it be possible to change the VGA mode to 80x43/80x50/80x60
during protected mode?

>With this patch, if we oops, there's a pause for a two minutes..
>which hopefully gives people enough time to grab a digital camera
>to take a screenshot of the oops.
>
It would be ideal to have something like BSD's "dump to predefined 
block device on oops", so extraction of oops logs requires neither 
pen-and-paper nor a digital camera. Requires another partition that
can be used for it, though.


>The one case this doesn't catch is the problem of oopses whilst
>in X. Previously a non-fatal oops would stall X momentarily,
>and then things continue. Now those cases will lock up completely
>for two minutes. Future patches could add some additional feedback
>during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.
>
Oh yes, include Stas Sergeev's PCSP patch and play a WAV telling "your box 
just crashed, wait two minutes for uh ... an oops you can't grab 
either"(*).

(*) If the oops is longer than 25 lines, ... you can't even use scrollback 
because scrollback is cleared when you change consoles. X runs by default 
on tty7, and the kernel dumps it somewhere else. (And even if it dumped to 
tty7 directly, you would not see it.)


Jan Engelhardt
-- 
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05  7:30   ` Bernd Eckenfels
  2006-01-05  8:07     ` Jan Engelhardt
@ 2006-01-05  9:25     ` Grant Coady
  2006-01-05 15:31       ` Mark Lord
  2006-01-05 11:11     ` Dave Jones
  2 siblings, 1 reply; 75+ messages in thread
From: Grant Coady @ 2006-01-05  9:25 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

On Thu, 05 Jan 2006 08:30:16 +0100, be-news06@lina.inka.de (Bernd Eckenfels) wrote:

>Randy.Dunlap <rdunlap@xenotime.net> wrote:
>> This one delays each printk() during boot by a variable time
>> (from kernel command line), while system_state == SYSTEM_BOOTING.
>
>This sounds a bit like a aprils fool joke, what it is meant to do? You can
>read the messages in the bootlog and use the scrollback keys, no?

No, after oops, console dead, very dead . . . no scrollback :(

Just the image on the screen, until one hits the power or reset 
button.

Very sad,,,  You want a kernel monitor to baby boot process?

Grant.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05  8:15 ` oops pauser Jan Engelhardt
@ 2006-01-05 10:33   ` Dave Jones
  2006-01-05 11:05     ` Jan Engelhardt
                       ` (3 more replies)
  0 siblings, 4 replies; 75+ messages in thread
From: Dave Jones @ 2006-01-05 10:33 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: linux-kernel

On Thu, Jan 05, 2006 at 09:15:02AM +0100, Jan Engelhardt wrote:

 > Here's something interesting too:
 > Sometimes, an oops is even longer than 25 rows, and the usual user
 > does not have
 >  - VGA mode with a lot of lines (because it's hard to read)
 >  - FB mode with a lot of lines (slow, and it's also hard to read)

See the other patch I sent which halves the amount of lines needed
for a backtrace on i386 (like x86-64 uses). This helps too.

 > Is it be possible to change the VGA mode to 80x43/80x50/80x60
 > during protected mode?

After an oops, we can't really rely on anything. What if the
oops came from the console layer, or a framebuffer driver?

 > >With this patch, if we oops, there's a pause for a two minutes..
 > >which hopefully gives people enough time to grab a digital camera
 > >to take a screenshot of the oops.
 > >
 > It would be ideal to have something like BSD's "dump to predefined 
 > block device on oops", so extraction of oops logs requires neither 
 > pen-and-paper nor a digital camera. Requires another partition that
 > can be used for it, though.

I dislike most of the disk dump patches that I've seen out there
because most of them rely on the system being in a decent enough
state to be able to write out blocks of data.

If I had any faith in the sturdyness of the floppy driver, I'd
recommend someone looked into a 'dump oops to floppy' patch, but
it too relies on a large part of the system being in a sane
enough state to write blocks out to disk.

 > (*) If the oops is longer than 25 lines, ... you can't even use scrollback 
 > because scrollback is cleared when you change consoles. X runs by default 
 > on tty7, and the kernel dumps it somewhere else. (And even if it dumped to 
 > tty7 directly, you would not see it.)

What to do about oopses whilst in X has been the subject of much
head-scratching for years now.  It's come up at least at the
last two kernel summits, and I'll hazard a guess it'll come up
again this year.  The amount of work necessary to make it all
work on both kernel side and X side isn't unsubstantial however,
so I wouldn't count on it working too soon.

Hmm, SuSE/Novell folks, doesn't NKLD take over an X display?
ISTR during a demo at last years OLS the presenter was flipping
in/out of the debugger between slides. Is there anything
useful there ?

		Dave


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 10:33   ` Dave Jones
@ 2006-01-05 11:05     ` Jan Engelhardt
  2006-01-05 12:05       ` Keith Owens
  2006-01-05 15:17       ` Jesper Juhl
  2006-01-05 13:46     ` Kurt Wall
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-05 11:05 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel


>See the other patch I sent which halves the amount of lines needed
>for a backtrace on i386 (like x86-64 uses). This helps too.
>
.oO( Compress the oops, encode it base64 and display that instead )Oo. :-)

> > Is it be possible to change the VGA mode to 80x43/80x50/80x60
> > during protected mode?
>
>After an oops, we can't really rely on anything. What if the
>oops came from the console layer, or a framebuffer driver?
>
Well, setting the video mode can be done (on x86, ugh) with a BIOS call, so 
we would not need to run through oops-affected code. But that was the 
question, if this int 0x10 call was possible at all. Think of VBE -
VBE3 is the first version that can be done in protected mode.

>If I had any faith in the sturdyness of the floppy driver, I'd
>recommend someone looked into a 'dump oops to floppy' patch, but
>it too relies on a large part of the system being in a sane
>enough state to write blocks out to disk.
>
Right, sad world. (With fun I await the day someone writes a morse encoder 
that writes oops to keyboard leds.)



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05  7:30   ` Bernd Eckenfels
  2006-01-05  8:07     ` Jan Engelhardt
  2006-01-05  9:25     ` Grant Coady
@ 2006-01-05 11:11     ` Dave Jones
  2006-01-07 21:44       ` Kurtis D. Rader
  2 siblings, 1 reply; 75+ messages in thread
From: Dave Jones @ 2006-01-05 11:11 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

On Thu, Jan 05, 2006 at 08:30:16AM +0100, Bernd Eckenfels wrote:
 > Randy.Dunlap <rdunlap@xenotime.net> wrote:
 > > This one delays each printk() during boot by a variable time
 > > (from kernel command line), while system_state == SYSTEM_BOOTING.
 > 
 > This sounds a bit like a aprils fool joke, what it is meant to do? You can
 > read the messages in the bootlog and use the scrollback keys, no?

could be handy for those 'I see a few messages that scroll, and the
box instantly reboots' bugs.  Quite rare, but they do happen.

		Dave

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 11:05     ` Jan Engelhardt
@ 2006-01-05 12:05       ` Keith Owens
  2006-01-05 15:17       ` Jesper Juhl
  1 sibling, 0 replies; 75+ messages in thread
From: Keith Owens @ 2006-01-05 12:05 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Dave Jones, linux-kernel

Jan Engelhardt (on Thu, 5 Jan 2006 12:05:08 +0100 (MET)) wrote:
>>
>Right, sad world. (With fun I await the day someone writes a morse encoder 
>that writes oops to keyboard leds.)

It's already been done, both leds and PC speaker.  http://kerneltrap.org/node/575/2355


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05  4:52 oops pauser Dave Jones
  2006-01-05  6:10 ` oops pauser. / boot_delayer Randy.Dunlap
  2006-01-05  8:15 ` oops pauser Jan Engelhardt
@ 2006-01-05 13:37 ` Alan Cox
  2006-01-05 20:52   ` Dave Jones
  2006-01-05 13:58 ` Avishay Traeger
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 75+ messages in thread
From: Alan Cox @ 2006-01-05 13:37 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

On Mer, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
> With this patch, if we oops, there's a pause for a two minutes..
> which hopefully gives people enough time to grab a digital camera
> to take a screenshot of the oops.

This appears to reduce the amount of information available as an oops
instead of spewing to the log and continuing generally will hang the box
stopping the scroll keys being used or dmesg being used to get the data
out. 

Who is going to wait two minutes for an oops when for most users its
their only box. Instead of pasting reports people will now reboot, or
perhaps send you the half a report they can see (which because we dump
too much info by default to fit the screen is also useless).

> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes. 

The console has awareness of graphic/text mode at all times and knows
what is going on. Why not use that information if you must go this way ?

Alan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 10:33   ` Dave Jones
  2006-01-05 11:05     ` Jan Engelhardt
@ 2006-01-05 13:46     ` Kurt Wall
  2006-01-06  1:24     ` David Lang
  2006-01-08 13:38     ` Ville Herva
  3 siblings, 0 replies; 75+ messages in thread
From: Kurt Wall @ 2006-01-05 13:46 UTC (permalink / raw)
  To: Dave Jones, Jan Engelhardt, linux-kernel

On Thu, Jan 05, 2006 at 05:33:39AM -0500, Dave Jones took 0 lines to write:
> 
> If I had any faith in the sturdyness of the floppy driver, I'd
> recommend someone looked into a 'dump oops to floppy' patch, but
> it too relies on a large part of the system being in a sane
> enough state to write blocks out to disk.

Not to mention that an increasing number of systems ship without a
floppy drive.

Kurt
-- 
If you perceive that there are four possible ways in which a procedure
can go wrong, and circumvent these, then a fifth way will promptly
develop.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05  4:52 oops pauser Dave Jones
                   ` (2 preceding siblings ...)
  2006-01-05 13:37 ` Alan Cox
@ 2006-01-05 13:58 ` Avishay Traeger
  2006-01-05 20:54   ` Dave Jones
  2006-01-06  0:19   ` Josef Sipek
  2006-01-05 14:39 ` Kyle McMartin
  2006-01-09 18:43 ` Console debugging wishlist was: " Andi Kleen
  5 siblings, 2 replies; 75+ messages in thread
From: Avishay Traeger @ 2006-01-05 13:58 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

Some comments:
1. I think this is a good idea, since serial consoles can also change
timings.  I have seen several race conditions where the problem goes
away once I add a serial console.
2. Should this be a separate debugging option?
3. Shouldn't you have KERN____ in your printk statements?
4. Wouldn't printing out the message every second make the oops scroll
off the screen, defeating the purpose of the patch?

Avishay Traeger
http://www.fsl.cs.sunysb.edu/~avishay/

On Wed, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
> In my quest to get better debug data from users in Fedora bug reports,
> I came up with this patch.  A majority of users don't have serial
> consoles, so when an oops scrolls off the top of the screen,
> and locks up, they usually end up reporting a 2nd (or later) oops
> that isn't particularly helpful (or worse, some inconsequential
> info like 'sleeping whilst atomic' warnings)
> 
> With this patch, if we oops, there's a pause for a two minutes..
> which hopefully gives people enough time to grab a digital camera
> to take a screenshot of the oops.
> 
> It has an on-screen timer so the user knows what's going on,
> (and that it's going to come back to life [maybe] after the oops).
> 
> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes. Future patches could add some additional feedback
> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.
> 
> Signed-off-by: Dave Jones <davej@redhat.com>
> 
> --- vanilla/arch/i386/kernel/traps.c	2006-01-02 22:21:10.000000000 -0500
> +++ linux-2.6.15/arch/i386/kernel/traps.c	2006-01-04 23:42:46.000000000 -0500
> @@ -256,6 +271,15 @@ void show_registers(struct pt_regs *regs
>  		}
>  	}
>  	printk("\n");
> +	{
> +		int i;
> +		for (i=120;i>0;i--) {
> +			mdelay(1000);
> +			touch_nmi_watchdog();
> +			printk("Continuing in %d seconds. \r", i);
> +		}
> +		printk("\n");
> +	}
>  }	
>  
>  static void handle_BUG(struct pt_regs *regs)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05  4:52 oops pauser Dave Jones
                   ` (3 preceding siblings ...)
  2006-01-05 13:58 ` Avishay Traeger
@ 2006-01-05 14:39 ` Kyle McMartin
  2006-01-09 18:43 ` Console debugging wishlist was: " Andi Kleen
  5 siblings, 0 replies; 75+ messages in thread
From: Kyle McMartin @ 2006-01-05 14:39 UTC (permalink / raw)
  To: Dave Jones, linux-kernel

On Wed, Jan 04, 2006 at 11:52:12PM -0500, Dave Jones wrote:
>  	printk("\n");
> +	{
> +		int i;
> +		for (i=120;i>0;i--) {
> +			mdelay(1000);
> +			touch_nmi_watchdog();
> +			printk("Continuing in %d seconds. \r", i);
> +		}
> +		printk("\n");
> +	}
>

Nice, this is cool. Though, perhaps it would be better if the loop length
was a command line argument like with panic_timeout? 

Cheers,
	Kyle

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 11:05     ` Jan Engelhardt
  2006-01-05 12:05       ` Keith Owens
@ 2006-01-05 15:17       ` Jesper Juhl
  1 sibling, 0 replies; 75+ messages in thread
From: Jesper Juhl @ 2006-01-05 15:17 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Dave Jones, linux-kernel

On 1/5/06, Jan Engelhardt <jengelh@linux01.gwdg.de> wrote:
>
> >See the other patch I sent which halves the amount of lines needed
> >for a backtrace on i386 (like x86-64 uses). This helps too.
> >
> .oO( Compress the oops, encode it base64 and display that instead )Oo. :-)
>
Not really something we want to do at Oops time and even if the kernel
was in a sane enough state to actually do it you've just increased the
amount of work needing to be done to decode the Oops by everyone
recieving/wanting to read it.

I think a better idea is to try and move things around so the most
useful pieces of information are on the last lines of the Oops output
(most likely to not have scrolled off the screen) and also work to
elliminate lines that are not really useful/helpful and maybe try to
cram more info from multiple short lines into a single line.

--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05  9:25     ` Grant Coady
@ 2006-01-05 15:31       ` Mark Lord
  2006-01-05 15:38         ` Avishay Traeger
  0 siblings, 1 reply; 75+ messages in thread
From: Mark Lord @ 2006-01-05 15:31 UTC (permalink / raw)
  To: gcoady; +Cc: Bernd Eckenfels, linux-kernel, davej

Grant Coady wrote:
>
> No, after oops, console dead, very dead . . . no scrollback :(

This mis-feature is beginning to annoy more and more.

I seem to recall that "in the old days" (1990s),
this was NOT the case:  scrollback still worked from oops.

I wonder if perhaps a better feature here would be to fix that again?

Cheers

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05 15:31       ` Mark Lord
@ 2006-01-05 15:38         ` Avishay Traeger
  2006-01-05 19:15           ` Mark Lord
  0 siblings, 1 reply; 75+ messages in thread
From: Avishay Traeger @ 2006-01-05 15:38 UTC (permalink / raw)
  To: Mark Lord; +Cc: gcoady, Bernd Eckenfels, linux-kernel, davej

On Thu, 2006-01-05 at 10:31 -0500, Mark Lord wrote:
> Grant Coady wrote:
> >
> > No, after oops, console dead, very dead . . . no scrollback :(
> 
> This mis-feature is beginning to annoy more and more.
> 
> I seem to recall that "in the old days" (1990s),
> this was NOT the case:  scrollback still worked from oops.

I am able to scroll up on the console for most regular oopses, but not
panics.  Am I missing something here?

Avishay Traeger
http://www.fsl.cs.sunysb.edu/~avishay/


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05 15:38         ` Avishay Traeger
@ 2006-01-05 19:15           ` Mark Lord
  0 siblings, 0 replies; 75+ messages in thread
From: Mark Lord @ 2006-01-05 19:15 UTC (permalink / raw)
  To: Avishay Traeger; +Cc: gcoady, Bernd Eckenfels, linux-kernel, davej

Avishay Traeger wrote:
> On Thu, 2006-01-05 at 10:31 -0500, Mark Lord wrote:
> 
>>Grant Coady wrote:
>>
>>>No, after oops, console dead, very dead . . . no scrollback :(
>>
>>This mis-feature is beginning to annoy more and more.
>>
>>I seem to recall that "in the old days" (1990s),
>>this was NOT the case:  scrollback still worked from oops.

s/oops/panic/

> I am able to scroll up on the console for most regular oopses, but not
> panics.  Am I missing something here?

No, I meant "panics".

Cheers

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 13:37 ` Alan Cox
@ 2006-01-05 20:52   ` Dave Jones
  2006-01-06 13:31     ` Alan Cox
  2006-01-06 15:22     ` Pavel Machek
  0 siblings, 2 replies; 75+ messages in thread
From: Dave Jones @ 2006-01-05 20:52 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Thu, Jan 05, 2006 at 01:37:33PM +0000, Alan Cox wrote:
 > On Mer, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
 > > With this patch, if we oops, there's a pause for a two minutes..
 > > which hopefully gives people enough time to grab a digital camera
 > > to take a screenshot of the oops.
 > 
 > This appears to reduce the amount of information available as an oops
 > instead of spewing to the log

The huge number of oopses never hit the logs.
They either hit early in boot before syslog is even running, or
they kill the box.
 
 > and continuing generally will hang the box
 > stopping the scroll keys being used or dmesg being used to get the data
 > out. 

This is exactly the problem this patch addresses.
The 'scroll keys' do not work in cases where we lock up after an oops.

If the useful parts of the oops scrolled off the top of the screen, we've
lost any chance of debugging whatever just happened. 

 > Who is going to wait two minutes for an oops when for most users its
 > their only box.

The real-world disagrees with you. In the few weeks it's been in Fedora,
several previously undiagnosable oopses were caught, and even *users*
agreed it was a useful addition.   If the two minutes is excessive, we can
lower it, or even make it a boot-option.

Another possibility is instantly continuing after a keypress.

 > Instead of pasting reports people will now reboot, or
 > perhaps send you the half a report they can see (which because we dump
 > too much info by default to fit the screen is also useless).

See the other patch which halves the number of lines needed for a backtrace.
With that, even if the user is running 25 line high displays, we've
a pretty good chance it'll fit except for really long backtraces,
and if that's the case, we can ask users to try to reproduce after
 booting with vga=1, (or better, vga=791 for eg).

 > > The one case this doesn't catch is the problem of oopses whilst
 > > in X. Previously a non-fatal oops would stall X momentarily,
 > > and then things continue. Now those cases will lock up completely
 > > for two minutes. 
 > 
 > The console has awareness of graphic/text mode at all times and knows
 > what is going on. Why not use that information if you must go this way ?

If we've just oopsed, the console may have no awareness of what day it is,
yet alone anything about video modes. I'm not entirely sure what you're
suggesting, but it gives me the creeps. Are you talking about switching
away from X back to a tty when we oops?

		Dave


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 13:58 ` Avishay Traeger
@ 2006-01-05 20:54   ` Dave Jones
  2006-01-06  0:19   ` Josef Sipek
  1 sibling, 0 replies; 75+ messages in thread
From: Dave Jones @ 2006-01-05 20:54 UTC (permalink / raw)
  To: Avishay Traeger; +Cc: linux-kernel

On Thu, Jan 05, 2006 at 08:58:53AM -0500, Avishay Traeger wrote:
 > Some comments:
 > 1. I think this is a good idea, since serial consoles can also change
 > timings.  I have seen several race conditions where the problem goes
 > away once I add a serial console.
 > 2. Should this be a separate debugging option?

maybe

 > 3. Shouldn't you have KERN____ in your printk statements?

doesn't make a great deal of difference in this context.

 > 4. Wouldn't printing out the message every second make the oops scroll
 > off the screen, defeating the purpose of the patch?

no. that's why it uses \r instead of \n.

		Dave


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 13:58 ` Avishay Traeger
  2006-01-05 20:54   ` Dave Jones
@ 2006-01-06  0:19   ` Josef Sipek
  2006-01-06  1:12     ` Bernd Eckenfels
  1 sibling, 1 reply; 75+ messages in thread
From: Josef Sipek @ 2006-01-06  0:19 UTC (permalink / raw)
  To: Avishay Traeger; +Cc: Dave Jones, linux-kernel

On Thu, Jan 05, 2006 at 08:58:53AM -0500, Avishay Traeger wrote:
> Some comments:
> 1. I think this is a good idea, since serial consoles can also change
> timings.  I have seen several race conditions where the problem goes
> away once I add a serial console.

Agreed.

> 2. Should this be a separate debugging option?

Agreed.

> 3. Shouldn't you have KERN____ in your printk statements?

That's something to watch out for...If you say have:

printk(KERN_DEBUG "fooo.....");
do_foo();
printk(KERN_DEBUG "done.\n");

Then, you'll get the extra "<7>" on the screen and in the logs (assuming
you set the printk levels to display KERN_DEBUG).

Now, I'm not 100% sure about '\r', but I suspect it does the same thing.

> 4. Wouldn't printing out the message every second make the oops scroll
> off the screen, defeating the purpose of the patch?

No, read the patch carefully, it uses '\r' to go back to the begining of
the line and overwrites the message.

Jeff.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-06  0:19   ` Josef Sipek
@ 2006-01-06  1:12     ` Bernd Eckenfels
  2006-01-06  1:35       ` Josef Sipek
  0 siblings, 1 reply; 75+ messages in thread
From: Bernd Eckenfels @ 2006-01-06  1:12 UTC (permalink / raw)
  To: linux-kernel

Josef Sipek <jsipek@fsl.cs.sunysb.edu> wrote:
> That's something to watch out for...If you say have:
> 
> printk(KERN_DEBUG "fooo.....");
> do_foo();
> printk(KERN_DEBUG "done.\n");

dont do it. It is better to have the time stamps for both and to have atomic
prints. In fact I would disallow this and add automatic linebreaks.

Gruss
Bernd

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 10:33   ` Dave Jones
  2006-01-05 11:05     ` Jan Engelhardt
  2006-01-05 13:46     ` Kurt Wall
@ 2006-01-06  1:24     ` David Lang
  2006-01-06  1:41       ` Josef Sipek
  2006-01-08 13:38     ` Ville Herva
  3 siblings, 1 reply; 75+ messages in thread
From: David Lang @ 2006-01-06  1:24 UTC (permalink / raw)
  To: Dave Jones; +Cc: Jan Engelhardt, linux-kernel

On Thu, 5 Jan 2006, Dave Jones wrote:

> > (*) If the oops is longer than 25 lines, ... you can't even use scrollback
> > because scrollback is cleared when you change consoles. X runs by default
> > on tty7, and the kernel dumps it somewhere else. (And even if it dumped to
> > tty7 directly, you would not see it.)
>
> What to do about oopses whilst in X has been the subject of much
> head-scratching for years now.  It's come up at least at the
> last two kernel summits, and I'll hazard a guess it'll come up
> again this year.  The amount of work necessary to make it all
> work on both kernel side and X side isn't unsubstantial however,
> so I wouldn't count on it working too soon.

hmm, if you can hope that someone will grab a camera to report an oops, 
how about them grabbing a tape recorder/mp3 recorder to record audio from 
the speaker. it's not fast, but you don't have that much data to output, 
do it in morse (with the audio explination of what's going to happen 
first)

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05  8:07     ` Jan Engelhardt
@ 2006-01-06  1:28       ` David Lang
  2006-01-06  5:36         ` Dave Jones
  2006-01-06  7:36         ` Jan Engelhardt
  0 siblings, 2 replies; 75+ messages in thread
From: David Lang @ 2006-01-06  1:28 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Bernd Eckenfels, linux-kernel

On Thu, 5 Jan 2006, Jan Engelhardt wrote:

> Also note that the kernel generates a lot of noise^W text - if now the
> start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> the top of the kernel when it says
>  Linux version 2.6.15 (jengelh@gwdg-wb04.gwdg.de) (gcc version 4.0.2
>  20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006

enable a few different types of encryption and you have to enlarge the 
buffer (by quite a bit). the fact that all the encryption tests print 
several lines each out and can't be turned off (short of a quiet boot 
where you loose everything) is one of the more annoying things to me right 
now.

this large boot message issue also slows your boot significantly if you 
have a fast box that has a serial console, it takes a long time to dump 
all that info out the serial port.

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-06  1:12     ` Bernd Eckenfels
@ 2006-01-06  1:35       ` Josef Sipek
  2006-01-06  2:21         ` Bernd Eckenfels
  0 siblings, 1 reply; 75+ messages in thread
From: Josef Sipek @ 2006-01-06  1:35 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

On Fri, Jan 06, 2006 at 02:12:59AM +0100, Bernd Eckenfels wrote:
> Josef Sipek <jsipek@fsl.cs.sunysb.edu> wrote:
> > That's something to watch out for...If you say have:
> > 
> > printk(KERN_DEBUG "fooo.....");
> > do_foo();
> > printk(KERN_DEBUG "done.\n");
> 
> dont do it. It is better to have the time stamps for both and to have atomic
> prints.

First of all, the above code is to just illustrate a point. And as a matter of
fact it may not even work if some other kernel thread prints something while
do_foo() is executing, the whole thing will get screwed up.

If I remember correctly, I the second line of the "sample" code, will _NOT_
produce a timestamp. So, the output will be:

[1234567.123456] fooo.....<7>done.

where, the timestamp is that of the first printk.

> In fact I would disallow this and add automatic linebreaks.

I wouldn't go that far. I'd just let the kernel janitors people have fun with
the existing code :)

Jeff.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-06  1:24     ` David Lang
@ 2006-01-06  1:41       ` Josef Sipek
  0 siblings, 0 replies; 75+ messages in thread
From: Josef Sipek @ 2006-01-06  1:41 UTC (permalink / raw)
  To: David Lang; +Cc: Dave Jones, Jan Engelhardt, linux-kernel

On Thu, Jan 05, 2006 at 05:24:01PM -0800, David Lang wrote:
> On Thu, 5 Jan 2006, Dave Jones wrote:
> 
> >> (*) If the oops is longer than 25 lines, ... you can't even use 
> >scrollback
> >> because scrollback is cleared when you change consoles. X runs by default
> >> on tty7, and the kernel dumps it somewhere else. (And even if it dumped 
> >to
> >> tty7 directly, you would not see it.)
> >
> >What to do about oopses whilst in X has been the subject of much
> >head-scratching for years now.  It's come up at least at the
> >last two kernel summits, and I'll hazard a guess it'll come up
> >again this year.  The amount of work necessary to make it all
> >work on both kernel side and X side isn't unsubstantial however,
> >so I wouldn't count on it working too soon.
> 
> hmm, if you can hope that someone will grab a camera to report an oops, 
> how about them grabbing a tape recorder/mp3 recorder to record audio from 
> the speaker. it's not fast, but you don't have that much data to output, 
> do it in morse (with the audio explination of what's going to happen 
> first)

There is a patch somewhere that uses the keyboard lights to "display" panics,
and a comment that the PC speaker implementation is left up to the reader :)

It shouldn't be hard do, then all you need is just one printk telling the user
to record it :)

Jeff.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-06  1:35       ` Josef Sipek
@ 2006-01-06  2:21         ` Bernd Eckenfels
  0 siblings, 0 replies; 75+ messages in thread
From: Bernd Eckenfels @ 2006-01-06  2:21 UTC (permalink / raw)
  To: linux-kernel

Josef Sipek <jsipek@fsl.cs.sunysb.edu> wrote:
> First of all, the above code is to just illustrate a point. And as a matter of
> fact it may not even work if some other kernel thread prints something while
> do_foo() is executing, the whole thing will get screwed up.

Thats another reason to not do it. And this means for me, we do not need to
support or optimize for this kind of printk abuse.

> If I remember correctly, I the second line of the "sample" code, will _NOT_
> produce a timestamp. So, the output will be:
> 
> [1234567.123456] fooo.....<7>done.

> where, the timestamp is that of the first printk.

Yes, thats the other problem, you miss the timestamp for the end of a long
running operation. Thats why it is better to have that in two lines (maybe
the second line with smaller severity)

Gruss
Bernd

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-06  1:28       ` David Lang
@ 2006-01-06  5:36         ` Dave Jones
  2006-01-06  7:00           ` David Lang
  2006-01-08 13:21           ` Pavel Machek
  2006-01-06  7:36         ` Jan Engelhardt
  1 sibling, 2 replies; 75+ messages in thread
From: Dave Jones @ 2006-01-06  5:36 UTC (permalink / raw)
  To: David Lang; +Cc: Jan Engelhardt, Bernd Eckenfels, linux-kernel

On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
 > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
 > 
 > >Also note that the kernel generates a lot of noise^W text - if now the
 > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
 > >the top of the kernel when it says
 > > Linux version 2.6.15 (jengelh@gwdg-wb04.gwdg.de) (gcc version 4.0.2
 > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
 > 
 > enable a few different types of encryption and you have to enlarge the 
 > buffer (by quite a bit). the fact that all the encryption tests print 
 > several lines each out and can't be turned off (short of a quiet boot 
 > where you loose everything) is one of the more annoying things to me right 
 > now.
 > 
 > this large boot message issue also slows your boot significantly if you 
 > have a fast box that has a serial console, it takes a long time to dump 
 > all that info out the serial port.

So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.

		Dave


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-06  5:36         ` Dave Jones
@ 2006-01-06  7:00           ` David Lang
  2006-01-08 13:21           ` Pavel Machek
  1 sibling, 0 replies; 75+ messages in thread
From: David Lang @ 2006-01-06  7:00 UTC (permalink / raw)
  To: Dave Jones; +Cc: Jan Engelhardt, Bernd Eckenfels, linux-kernel

On Fri, 6 Jan 2006, Dave Jones wrote:

> On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
> > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
> >
> > >Also note that the kernel generates a lot of noise^W text - if now the
> > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> > >the top of the kernel when it says
> > > Linux version 2.6.15 (jengelh@gwdg-wb04.gwdg.de) (gcc version 4.0.2
> > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
> >
> > enable a few different types of encryption and you have to enlarge the
> > buffer (by quite a bit). the fact that all the encryption tests print
> > several lines each out and can't be turned off (short of a quiet boot
> > where you loose everything) is one of the more annoying things to me right
> > now.
> >
> > this large boot message issue also slows your boot significantly if you
> > have a fast box that has a serial console, it takes a long time to dump
> > all that info out the serial port.
>
> So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.
>

I've looked for such a config option and not found it in menuconfig. I'll 
take another look.

Ok, I found it. the help isn't clear about exactly what this does. Adding 
a blurb that you probably want it off unless you are developeing a crypto 
module, or that it's intended as a debugging tool would help clarify it.

Thanks.

   David Lang


-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-06  1:28       ` David Lang
  2006-01-06  5:36         ` Dave Jones
@ 2006-01-06  7:36         ` Jan Engelhardt
  2006-01-06  8:33           ` David Lang
  1 sibling, 1 reply; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-06  7:36 UTC (permalink / raw)
  To: David Lang; +Cc: Bernd Eckenfels, linux-kernel


> this large boot message issue also slows your boot significantly if you have a
> fast box that has a serial console, it takes a long time to dump all that info
> out the serial port.

Don't blame the kernel that serial is slow.



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-06  7:36         ` Jan Engelhardt
@ 2006-01-06  8:33           ` David Lang
  0 siblings, 0 replies; 75+ messages in thread
From: David Lang @ 2006-01-06  8:33 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Bernd Eckenfels, linux-kernel

On Fri, 6 Jan 2006, Jan Engelhardt wrote:

>> this large boot message issue also slows your boot significantly if you have a
>> fast box that has a serial console, it takes a long time to dump all that info
>> out the serial port.
>
> Don't blame the kernel that serial is slow.

the complaint wasn't that the serial was slow, It was a comment on the 
amount of data being displayed during a boot (which turned out to be in 
large part that I had a verbose config option turned on)

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 20:52   ` Dave Jones
@ 2006-01-06 13:31     ` Alan Cox
  2006-01-06 20:33       ` Dave Jones
  2006-01-06 15:22     ` Pavel Machek
  1 sibling, 1 reply; 75+ messages in thread
From: Alan Cox @ 2006-01-06 13:31 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

On Iau, 2006-01-05 at 15:52 -0500, Dave Jones wrote:
> The huge number of oopses never hit the logs.
> They either hit early in boot before syslog is even running, or
> they kill the box.

So you don't need a two minute delay for those because as you said it
froze the box
>  
>  > and continuing generally will hang the box
>  > stopping the scroll keys being used or dmesg being used to get the data
>  > out. 
> 
> This is exactly the problem this patch addresses.
> The 'scroll keys' do not work in cases where we lock up after an oops.

And in those cases the 2 minute freeze is meaningless

> The real-world disagrees with you. In the few weeks it's been in Fedora,
> several previously undiagnosable oopses were caught, and even *users*
> agreed it was a useful addition.   If the two minutes is excessive, we can
> lower it, or even make it a boot-option.

Any change will capture different oopses. A boot option isnt a bad idea,
or for that matter also truncating the call trace to the *top* few (or
as Bryce suggested on irc reversing the printing order)

> Another possibility is instantly continuing after a keypress.

If the input layer is running that would be sensible.

>  > The console has awareness of graphic/text mode at all times and knows
>  > what is going on. Why not use that information if you must go this way ?
> 
> If we've just oopsed, the console may have no awareness of what day it is,
> yet alone anything about video modes. I'm not entirely sure what you're
> suggesting, but it gives me the creeps. Are you talking about switching
> away from X back to a tty when we oops?

Well you could try and do that but I was more thinking that if the
console has been told we are in graphics mode then the 2 minute delay
shouldn't occur.

Alan


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 20:52   ` Dave Jones
  2006-01-06 13:31     ` Alan Cox
@ 2006-01-06 15:22     ` Pavel Machek
  2006-01-06 19:06       ` Jan Engelhardt
  2006-01-06 22:48       ` Dave Jones
  1 sibling, 2 replies; 75+ messages in thread
From: Pavel Machek @ 2006-01-06 15:22 UTC (permalink / raw)
  To: Dave Jones, Alan Cox, linux-kernel

Hi!

>  > > The one case this doesn't catch is the problem of oopses whilst
>  > > in X. Previously a non-fatal oops would stall X momentarily,
>  > > and then things continue. Now those cases will lock up completely
>  > > for two minutes. 
>  > 
>  > The console has awareness of graphic/text mode at all times and knows
>  > what is going on. Why not use that information if you must go this way ?
> 
> If we've just oopsed, the console may have no awareness of what day it is,
> yet alone anything about video modes. I'm not entirely sure what you're
> suggesting, but it gives me the creeps. Are you talking about switching
> away from X back to a tty when we oops?

No.

But you _know_ if user is running X or not -- notice that kernel does
not attempt to printk() when X is running, because that could lock up
the box.

If user is running X, you don't need the delay.

if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
	delay(10sec)
}

or something like that should do the trick.
								Pavel

-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-06 15:22     ` Pavel Machek
@ 2006-01-06 19:06       ` Jan Engelhardt
  2006-01-06 22:34         ` Pavel Machek
  2006-01-06 22:48       ` Dave Jones
  1 sibling, 1 reply; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-06 19:06 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Dave Jones, Alan Cox, linux-kernel

>No.
>
>But you _know_ if user is running X or not -- notice that kernel does
>not attempt to printk() when X is running, because that could lock up
>the box.
>
>If user is running X, you don't need the delay.
>
>if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {

Does framebuffer fall under KD_TEXT?


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-06 13:31     ` Alan Cox
@ 2006-01-06 20:33       ` Dave Jones
  0 siblings, 0 replies; 75+ messages in thread
From: Dave Jones @ 2006-01-06 20:33 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Fri, Jan 06, 2006 at 01:31:10PM +0000, Alan Cox wrote:
 > On Iau, 2006-01-05 at 15:52 -0500, Dave Jones wrote:
 > > The huge number of oopses never hit the logs.
 > > They either hit early in boot before syslog is even running, or
 > > they kill the box.
 > 
 > So you don't need a two minute delay for those because as you said it
 > froze the box

it froze *AFTER* the oops had scrolled off the top of the screen.

The sequence of events before

oops
scrolly scrolly
random crap about sleeping whilst atomic or the like
scrolly scrolly
HANG

with this patch..

oops
*pause for two minutes whilst user takes a picture/scribbles it down*
scrolly scrolly
random crap about sleeping whilst atomic or the like
scrolly scrolly
HANG

 > >  > and continuing generally will hang the box
 > >  > stopping the scroll keys being used or dmesg being used to get the data
 > >  > out. 
 > > 
 > > This is exactly the problem this patch addresses.
 > > The 'scroll keys' do not work in cases where we lock up after an oops.
 > 
 > And in those cases the 2 minute freeze is meaningless

it does if it stops the oops scrolling off the screen first long enough
to capture it.

 > > Another possibility is instantly continuing after a keypress.
 > If the input layer is running that would be sensible.

Yeah, questionable. And polling hardware won't work due to usb keyboards.

 > > If we've just oopsed, the console may have no awareness of what day it is,
 > > yet alone anything about video modes. I'm not entirely sure what you're
 > > suggesting, but it gives me the creeps. Are you talking about switching
 > > away from X back to a tty when we oops?
 > 
 > Well you could try and do that but I was more thinking that if the
 > console has been told we are in graphics mode then the 2 minute delay
 > shouldn't occur.

Hmm. I'll look into that.
Any pointers ? (I don't want to spend longer than necessary looking
in that code :-)

		Dave


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-06 19:06       ` Jan Engelhardt
@ 2006-01-06 22:34         ` Pavel Machek
  0 siblings, 0 replies; 75+ messages in thread
From: Pavel Machek @ 2006-01-06 22:34 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Dave Jones, Alan Cox, linux-kernel

On Pá 06-01-06 20:06:36, Jan Engelhardt wrote:
> >No.
> >
> >But you _know_ if user is running X or not -- notice that kernel does
> >not attempt to printk() when X is running, because that could lock up
> >the box.
> >
> >If user is running X, you don't need the delay.
> >
> >if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
> 
> Does framebuffer fall under KD_TEXT?

I think so.

-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-06 15:22     ` Pavel Machek
  2006-01-06 19:06       ` Jan Engelhardt
@ 2006-01-06 22:48       ` Dave Jones
  1 sibling, 0 replies; 75+ messages in thread
From: Dave Jones @ 2006-01-06 22:48 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Alan Cox, linux-kernel

On Fri, Jan 06, 2006 at 04:22:03PM +0100, Pavel Machek wrote:
 > Hi!
 > 
 > >  > > The one case this doesn't catch is the problem of oopses whilst
 > >  > > in X. Previously a non-fatal oops would stall X momentarily,
 > >  > > and then things continue. Now those cases will lock up completely
 > >  > > for two minutes. 
 > >  > 
 > >  > The console has awareness of graphic/text mode at all times and knows
 > >  > what is going on. Why not use that information if you must go this way ?
 > > 
 > > If we've just oopsed, the console may have no awareness of what day it is,
 > > yet alone anything about video modes. I'm not entirely sure what you're
 > > suggesting, but it gives me the creeps. Are you talking about switching
 > > away from X back to a tty when we oops?
 > 
 > No.
 > 
 > But you _know_ if user is running X or not -- notice that kernel does
 > not attempt to printk() when X is running, because that could lock up
 > the box.
 > 
 > If user is running X, you don't need the delay.
 > 
 > if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
 > 	delay(10sec)
 > }

>From this context though, we don't have a 'vc' to reference,
so we'll need to find out from the console layer somehow, which
is the current vc.

		Dave


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-05 11:11     ` Dave Jones
@ 2006-01-07 21:44       ` Kurtis D. Rader
  2006-01-07 21:48         ` Arjan van de Ven
  2006-01-07 22:27         ` Bernd Eckenfels
  0 siblings, 2 replies; 75+ messages in thread
From: Kurtis D. Rader @ 2006-01-07 21:44 UTC (permalink / raw)
  To: Dave Jones, Bernd Eckenfels, linux-kernel

On Thu, 2006-01-05 06:11:05, Dave Jones wrote:
> On Thu, Jan 05, 2006 at 08:30:16AM +0100, Bernd Eckenfels wrote:
>  > Randy.Dunlap <rdunlap@xenotime.net> wrote:
>  > > This one delays each printk() during boot by a variable time
>  > > (from kernel command line), while system_state == SYSTEM_BOOTING.
>  >
>  > This sounds a bit like a aprils fool joke, what it is meant to do? You can
>  > read the messages in the bootlog and use the scrollback keys, no?
> 
> could be handy for those 'I see a few messages that scroll, and the
> box instantly reboots' bugs.  Quite rare, but they do happen.

Another very common situation is a system which fails to boot due to
failures to find the root filesystem. This can happen because of device name
slippage, root disk not being found, the proper HBA driver isn't present in
the initrd image, etc. The customer calls us and reports the last thing they
see on the screen:

    Mounting root filesystem
    Kmod : failed to exec /sbin/modprobe -s -k block-major-8 , error = 2
    mount : error 6 mounting ext3
    pivotroot : pivot_root(/sysroot,.sysroot/initrd) failed : 2
    Freeing unused memory
    Kernel panic : No init found . Try passing init= option to kernel

Great! Only problem is the info we really need has already scrolled of the
screen. An option to pause briefly after each boot time printk would be very
useful.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-07 21:44       ` Kurtis D. Rader
@ 2006-01-07 21:48         ` Arjan van de Ven
  2006-01-07 22:00           ` Kurtis D. Rader
  2006-01-08 23:29           ` David Lang
  2006-01-07 22:27         ` Bernd Eckenfels
  1 sibling, 2 replies; 75+ messages in thread
From: Arjan van de Ven @ 2006-01-07 21:48 UTC (permalink / raw)
  To: Kurtis D. Rader; +Cc: Dave Jones, Bernd Eckenfels, linux-kernel

On Sat, 2006-01-07 at 13:44 -0800, Kurtis D. Rader wrote:
> On Thu, 2006-01-05 06:11:05, Dave Jones wrote:
> > On Thu, Jan 05, 2006 at 08:30:16AM +0100, Bernd Eckenfels wrote:
> >  > Randy.Dunlap <rdunlap@xenotime.net> wrote:
> >  > > This one delays each printk() during boot by a variable time
> >  > > (from kernel command line), while system_state == SYSTEM_BOOTING.
> >  >
> >  > This sounds a bit like a aprils fool joke, what it is meant to do? You can
> >  > read the messages in the bootlog and use the scrollback keys, no?
> > 
> > could be handy for those 'I see a few messages that scroll, and the
> > box instantly reboots' bugs.  Quite rare, but they do happen.
> 
> Another very common situation is a system which fails to boot due to
> failures to find the root filesystem. This can happen because of device name
> slippage, root disk not being found, the proper HBA driver isn't present in

mount by label fixes some of that but not all

> the initrd image, etc. The customer calls us and reports the last thing they
> see on the screen:

fwiw it would make sense (at least for distros) to make this print a
more helpful text about potential causes etc, rather than just making
people say "the kernel paniced".



^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-07 21:48         ` Arjan van de Ven
@ 2006-01-07 22:00           ` Kurtis D. Rader
  2006-01-08 23:29           ` David Lang
  1 sibling, 0 replies; 75+ messages in thread
From: Kurtis D. Rader @ 2006-01-07 22:00 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Dave Jones, Bernd Eckenfels, linux-kernel

On Sat, 2006-01-07 22:48:08, Arjan van de Ven wrote:
> On Sat, 2006-01-07 at 13:44 -0800, Kurtis D. Rader wrote:
> > 
> > Another very common situation is a system which fails to boot due to
> > failures to find the root filesystem. This can happen because of device name
> > slippage, root disk not being found, the proper HBA driver isn't present in
> 
> mount by label fixes some of that but not all

The "not all" case is important. Especially since the potential causes
of being unable to find the root filesystem keep increasing with each
new capability.  And it isn't just failures involving finding the rootfs
that can be problematic to debug without more context than is on the
final screen image.

> > the initrd image, etc. The customer calls us and reports the last thing they
> > see on the screen:
> 
> fwiw it would make sense (at least for distros) to make this print a
> more helpful text about potential causes etc, rather than just making
> people say "the kernel paniced".

That might be useful for people who don't have support contracts. It
wouldn't help customer support teams like I'm a member of. We know what
those potential reasons are. The challenge is having enough context to
quickly determine which possible explanation accounts for the failure.
Ideally every customer would have a serial console configured. But a)
most customers don't/won't/can't configure one and b) on many systems a
serial console is not available.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-07 21:44       ` Kurtis D. Rader
  2006-01-07 21:48         ` Arjan van de Ven
@ 2006-01-07 22:27         ` Bernd Eckenfels
  1 sibling, 0 replies; 75+ messages in thread
From: Bernd Eckenfels @ 2006-01-07 22:27 UTC (permalink / raw)
  To: linux-kernel

On Sat, Jan 07, 2006 at 01:44:39PM -0800, Kurtis D. Rader wrote:
> Great! Only problem is the info we really need has already scrolled of the
> screen. An option to pause briefly after each boot time printk would be very
> useful.

I dont think so. It is too much to read to an supporter by phone, and
somebody who can diag that self knows exactly where the root is searched in
his config. After all it can only be a hardware or driver problem.

I think it makes much more sense to allow scrollback than to delay
printouts. (And I am quite sure scrollback works in this case)

Gruss
Bernd
-- 
  (OO)     -- Bernd_Eckenfels@Mörscher_Strasse_8.76185Karlsruhe.de --
 ( .. )    ecki@{inka.de,linux.de,debian.org}  http://www.eckes.org/
  o--o   1024D/E383CD7E  eckes@IRCNet  v:+497211603874  f:+49721151516129
(O____O)  When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-06  5:36         ` Dave Jones
  2006-01-06  7:00           ` David Lang
@ 2006-01-08 13:21           ` Pavel Machek
  2006-01-08 19:30             ` Josef Sipek
  1 sibling, 1 reply; 75+ messages in thread
From: Pavel Machek @ 2006-01-08 13:21 UTC (permalink / raw)
  To: Dave Jones, David Lang, Jan Engelhardt, Bernd Eckenfels, linux-kernel

On Pá 06-01-06 00:36:09, Dave Jones wrote:
> On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
>  > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
>  > 
>  > >Also note that the kernel generates a lot of noise^W text - if now the
>  > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
>  > >the top of the kernel when it says
>  > > Linux version 2.6.15 (jengelh@gwdg-wb04.gwdg.de) (gcc version 4.0.2
>  > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
>  > 
>  > enable a few different types of encryption and you have to enlarge the 
>  > buffer (by quite a bit). the fact that all the encryption tests print 
>  > several lines each out and can't be turned off (short of a quiet boot 
>  > where you loose everything) is one of the more annoying things to me right 
>  > now.
>  > 
>  > this large boot message issue also slows your boot significantly if you 
>  > have a fast box that has a serial console, it takes a long time to dump 
>  > all that info out the serial port.
> 
> So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.

Maybe even with CRYPTO_TEST enabled we could only report _failures_?
									Pavel
-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-05 10:33   ` Dave Jones
                       ` (2 preceding siblings ...)
  2006-01-06  1:24     ` David Lang
@ 2006-01-08 13:38     ` Ville Herva
  2006-01-08 13:53       ` Randy.Dunlap
  3 siblings, 1 reply; 75+ messages in thread
From: Ville Herva @ 2006-01-08 13:38 UTC (permalink / raw)
  To: linux-kernel

On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
> 
> If I had any faith in the sturdyness of the floppy driver, I'd
> recommend someone looked into a 'dump oops to floppy' patch, but
> it too relies on a large part of the system being in a sane
> enough state to write blocks out to disk.

I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
minimal 16-bit floppy driver to save the oops dump. 

Kmsgdump has been around for ages and still works with 2.6.x. I almost
always use it (all of my boxes still have floppy drives.)


-- v -- 

v@iki.fi


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-08 13:38     ` Ville Herva
@ 2006-01-08 13:53       ` Randy.Dunlap
  2006-01-08 19:35         ` Jan Engelhardt
  2006-01-08 19:40         ` Grant Coady
  0 siblings, 2 replies; 75+ messages in thread
From: Randy.Dunlap @ 2006-01-08 13:53 UTC (permalink / raw)
  To: vherva; +Cc: linux-kernel

On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:

> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
> > 
> > If I had any faith in the sturdyness of the floppy driver, I'd
> > recommend someone looked into a 'dump oops to floppy' patch, but
> > it too relies on a large part of the system being in a sane
> > enough state to write blocks out to disk.
> 
> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> minimal 16-bit floppy driver to save the oops dump. 

It just switches to real mode and uses BIOS calls.

> Kmsgdump has been around for ages and still works with 2.6.x. I almost
> always use it (all of my boxes still have floppy drives.)


---
~Randy

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-08 13:21           ` Pavel Machek
@ 2006-01-08 19:30             ` Josef Sipek
  2006-01-08 23:08               ` Pavel Machek
  0 siblings, 1 reply; 75+ messages in thread
From: Josef Sipek @ 2006-01-08 19:30 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Dave Jones, David Lang, Jan Engelhardt, Bernd Eckenfels, linux-kernel

On Sun, Jan 08, 2006 at 02:21:32PM +0100, Pavel Machek wrote:
> On Pá 06-01-06 00:36:09, Dave Jones wrote:
> > On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
> >  > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
> >  > 
> >  > >Also note that the kernel generates a lot of noise^W text - if now the
> >  > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> >  > >the top of the kernel when it says
> >  > > Linux version 2.6.15 (jengelh@gwdg-wb04.gwdg.de) (gcc version 4.0.2
> >  > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
> >  > 
> >  > enable a few different types of encryption and you have to enlarge the 
> >  > buffer (by quite a bit). the fact that all the encryption tests print 
> >  > several lines each out and can't be turned off (short of a quiet boot 
> >  > where you loose everything) is one of the more annoying things to me right 
> >  > now.
> >  > 
> >  > this large boot message issue also slows your boot significantly if you 
> >  > have a fast box that has a serial console, it takes a long time to dump 
> >  > all that info out the serial port.
> > 
> > So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.
> 
> Maybe even with CRYPTO_TEST enabled we could only report _failures_?

Why? As far as I know, it is intended for developers as a regression test. I say
if you don't like the output, make the thing a module or don't compile it at all.

Jeff.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-08 13:53       ` Randy.Dunlap
@ 2006-01-08 19:35         ` Jan Engelhardt
  2006-01-09  1:43           ` Randy.Dunlap
  2006-01-08 19:40         ` Grant Coady
  1 sibling, 1 reply; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-08 19:35 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: vherva, linux-kernel

>> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
>> minimal 16-bit floppy driver to save the oops dump. 
>
>It just switches to real mode and uses BIOS calls.
>

This technique btw is what I suggested (switch to 80x50 vga mode
(if not in X)) in case of a longer oops trace.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-08 13:53       ` Randy.Dunlap
  2006-01-08 19:35         ` Jan Engelhardt
@ 2006-01-08 19:40         ` Grant Coady
  2006-01-09  1:45           ` Randy.Dunlap
  1 sibling, 1 reply; 75+ messages in thread
From: Grant Coady @ 2006-01-08 19:40 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: vherva, linux-kernel

On Sun, 8 Jan 2006 05:53:22 -0800, "Randy.Dunlap" <rdunlap@xenotime.net> wrote:

>On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:
>
>> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
>> > 
>> > If I had any faith in the sturdyness of the floppy driver, I'd
>> > recommend someone looked into a 'dump oops to floppy' patch, but
>> > it too relies on a large part of the system being in a sane
>> > enough state to write blocks out to disk.
>> 
>> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
>> minimal 16-bit floppy driver to save the oops dump. 
>
>It just switches to real mode and uses BIOS calls.

So would it be viable to take over the screen in similar fashion?

Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops 
screen, or Poops for short :o)

Grant.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-08 19:30             ` Josef Sipek
@ 2006-01-08 23:08               ` Pavel Machek
  2006-01-08 23:39                 ` Josef Sipek
  0 siblings, 1 reply; 75+ messages in thread
From: Pavel Machek @ 2006-01-08 23:08 UTC (permalink / raw)
  To: Josef Sipek
  Cc: Dave Jones, David Lang, Jan Engelhardt, Bernd Eckenfels, linux-kernel

On Ne 08-01-06 14:30:00, Josef Sipek wrote:
> On Sun, Jan 08, 2006 at 02:21:32PM +0100, Pavel Machek wrote:
> > On Pá 06-01-06 00:36:09, Dave Jones wrote:
> > > On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
> > >  > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
> > >  > 
> > >  > >Also note that the kernel generates a lot of noise^W text - if now the
> > >  > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> > >  > >the top of the kernel when it says
> > >  > > Linux version 2.6.15 (jengelh@gwdg-wb04.gwdg.de) (gcc version 4.0.2
> > >  > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
> > >  > 
> > >  > enable a few different types of encryption and you have to enlarge the 
> > >  > buffer (by quite a bit). the fact that all the encryption tests print 
> > >  > several lines each out and can't be turned off (short of a quiet boot 
> > >  > where you loose everything) is one of the more annoying things to me right 
> > >  > now.
> > >  > 
> > >  > this large boot message issue also slows your boot significantly if you 
> > >  > have a fast box that has a serial console, it takes a long time to dump 
> > >  > all that info out the serial port.
> > > 
> > > So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.
> > 
> > Maybe even with CRYPTO_TEST enabled we could only report _failures_?
> 
> Why? As far as I know, it is intended for developers as a regression test. I say
> if you don't like the output, make the thing a module or don't compile it at all.

I don't like the output, but if it only reported failures, I could
leave it running and potentially catch some strange failures. Is
reporting successes actually useful?
							Pavel

-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-07 21:48         ` Arjan van de Ven
  2006-01-07 22:00           ` Kurtis D. Rader
@ 2006-01-08 23:29           ` David Lang
  1 sibling, 0 replies; 75+ messages in thread
From: David Lang @ 2006-01-08 23:29 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Kurtis D. Rader, Dave Jones, Bernd Eckenfels, linux-kernel

On Sat, 7 Jan 2006, Arjan van de Ven wrote:

> On Sat, 2006-01-07 at 13:44 -0800, Kurtis D. Rader wrote:
>> On Thu, 2006-01-05 06:11:05, Dave Jones wrote:
>>> On Thu, Jan 05, 2006 at 08:30:16AM +0100, Bernd Eckenfels wrote:
>>> > Randy.Dunlap <rdunlap@xenotime.net> wrote:
>>> >> This one delays each printk() during boot by a variable time
>>> >> (from kernel command line), while system_state == SYSTEM_BOOTING.
>>> >
>>> > This sounds a bit like a aprils fool joke, what it is meant to do? You can
>>> > read the messages in the bootlog and use the scrollback keys, no?
>>>
>>> could be handy for those 'I see a few messages that scroll, and the
>>> box instantly reboots' bugs.  Quite rare, but they do happen.
>>
>> Another very common situation is a system which fails to boot due to
>> failures to find the root filesystem. This can happen because of device name
>> slippage, root disk not being found, the proper HBA driver isn't present in
>
> mount by label fixes some of that but not all

there appears to be a limit on how many disks get checked for their label. 
I've got one system where I've got 2xraid cards each with 8 drives on them 
and then another raid card with my boot disk on it.

depending on how I have the two raid cards the boot disk can be anything 
from sdc to sdq, mounting by label works for sdc, but not for sdq.

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser. / boot_delayer
  2006-01-08 23:08               ` Pavel Machek
@ 2006-01-08 23:39                 ` Josef Sipek
  0 siblings, 0 replies; 75+ messages in thread
From: Josef Sipek @ 2006-01-08 23:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Dave Jones, David Lang, Jan Engelhardt, Bernd Eckenfels, linux-kernel

On Mon, Jan 09, 2006 at 12:08:27AM +0100, Pavel Machek wrote:
> On Ne 08-01-06 14:30:00, Josef Sipek wrote:
> > On Sun, Jan 08, 2006 at 02:21:32PM +0100, Pavel Machek wrote:
> > > On Pá 06-01-06 00:36:09, Dave Jones wrote:
> > > > So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.
> > > 
> > > Maybe even with CRYPTO_TEST enabled we could only report _failures_?
> > 
> > Why? As far as I know, it is intended for developers as a regression test. I say
> > if you don't like the output, make the thing a module or don't compile it at all.
> 
> I don't like the output, but if it only reported failures, I could
> leave it running and potentially catch some strange failures.

I agree that it is useful to know about strange failures, however I still maintain
that _if_ the module is intended as a regression test for developers, than the
excessive (?) output is fair. I think that the most logical course of action is to
have a verbosity module paramter which defaults to displaying errors only, but it still
allows developers to get all the information they need.

> Is reporting successes actually useful?

Then I propose: :)


diff -r b4fca0ece97f kernel/sys.c
--- a/kernel/sys.c	Sat Oct 22 19:24:10 2005 +0300
+++ b/kernel/sys.c	Sun Jan  8 18:26:49 2006 -0500
@@ -436,7 +436,6 @@
 void kernel_halt(void)
 {
 	kernel_halt_prepare();
-	printk(KERN_EMERG "System halted.\n");
 	machine_halt();
 }
 EXPORT_SYMBOL_GPL(kernel_halt);

Jeff.


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-08 19:35         ` Jan Engelhardt
@ 2006-01-09  1:43           ` Randy.Dunlap
  0 siblings, 0 replies; 75+ messages in thread
From: Randy.Dunlap @ 2006-01-09  1:43 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: vherva, linux-kernel

On Sun, 8 Jan 2006 20:35:08 +0100 (MET) Jan Engelhardt wrote:

> >> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> >> minimal 16-bit floppy driver to save the oops dump. 
> >
> >It just switches to real mode and uses BIOS calls.
> >
> 
> This technique btw is what I suggested (switch to 80x50 vga mode
> (if not in X)) in case of a longer oops trace.

kmsgdump already shows all of the kernel log buffer that is in
memory (has not been written to disk, basically).

If I (or we) had some time and motivation, I have a
contributed patch to kmsgdump that:

a.  saves and dumps all of the kernel log buffer
    (reminder:  current dump targets are display, parallel port
    printer, and legacy floppy disk)
b.  adds a hard disk dump target and attempts to make this safe
    by pre-reserving and writing each block of it with a
    signature + block number (and maybe more, I'm not sure
    right now)
c.  add x86-64 support

but I have not merged this code into kmsgdump yet, nor have
I even tested it.  I can't test the x86-64 support since I
don't (yet) have an x86-64 system available for this.

If anyone wants to work on this, I'll put the additional
code on the web.

---
~Randy

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-08 19:40         ` Grant Coady
@ 2006-01-09  1:45           ` Randy.Dunlap
  2006-01-09 16:15             ` Jan Engelhardt
  0 siblings, 1 reply; 75+ messages in thread
From: Randy.Dunlap @ 2006-01-09  1:45 UTC (permalink / raw)
  To: gcoady; +Cc: vherva, linux-kernel

On Mon, 09 Jan 2006 06:40:57 +1100 Grant Coady wrote:

> On Sun, 8 Jan 2006 05:53:22 -0800, "Randy.Dunlap" <rdunlap@xenotime.net> wrote:
> 
> >On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:
> >
> >> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
> >> > 
> >> > If I had any faith in the sturdyness of the floppy driver, I'd
> >> > recommend someone looked into a 'dump oops to floppy' patch, but
> >> > it too relies on a large part of the system being in a sane
> >> > enough state to write blocks out to disk.
> >> 
> >> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> >> minimal 16-bit floppy driver to save the oops dump. 
> >
> >It just switches to real mode and uses BIOS calls.
> 
> So would it be viable to take over the screen in similar fashion?
> 
> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops 
> screen, or Poops for short :o)

It does take over the screen.  80x50 isn't needed since it knows how
to scroll the kernel log buffer on 80x25.

---
~Randy

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-09  1:45           ` Randy.Dunlap
@ 2006-01-09 16:15             ` Jan Engelhardt
  2006-01-09 16:25               ` Ville Herva
  2006-01-09 16:39               ` Randy.Dunlap
  0 siblings, 2 replies; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-09 16:15 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: gcoady, vherva, linux-kernel

>> So would it be viable to take over the screen in similar fashion?
>> 
>> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops 
>> screen, or Poops for short :o)
>
>It does take over the screen.  80x50 isn't needed since it knows how
>to scroll the kernel log buffer on 80x25.

It's needed because scrolling back might be impossible (shift-up in panic 
= no-go), not because it knows how to scroll.


Jan Engelhardt
-- 
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-09 16:15             ` Jan Engelhardt
@ 2006-01-09 16:25               ` Ville Herva
  2006-01-09 16:39               ` Randy.Dunlap
  1 sibling, 0 replies; 75+ messages in thread
From: Ville Herva @ 2006-01-09 16:25 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Randy.Dunlap, gcoady, linux-kernel

On Mon, Jan 09, 2006 at 05:15:55PM +0100, you [Jan Engelhardt] wrote:
> >> So would it be viable to take over the screen in similar fashion?
> >> 
> >> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops 
> >> screen, or Poops for short :o)
> >
> >It does take over the screen.  80x50 isn't needed since it knows how
> >to scroll the kernel log buffer on 80x25.
> 
> It's needed because scrolling back might be impossible (shift-up in panic 
> = no-go), not because it knows how to scroll.

Please try kmsgdump. 

It has its own real-mode terminal (with scrolling) to which it switches on
oops. Hung kernel console doesn't affect it.



-- v -- 

v@iki.fi


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: oops pauser.
  2006-01-09 16:15             ` Jan Engelhardt
  2006-01-09 16:25               ` Ville Herva
@ 2006-01-09 16:39               ` Randy.Dunlap
  1 sibling, 0 replies; 75+ messages in thread
From: Randy.Dunlap @ 2006-01-09 16:39 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Randy.Dunlap, gcoady, vherva, linux-kernel

On Mon, 9 Jan 2006, Jan Engelhardt wrote:

> >> So would it be viable to take over the screen in similar fashion?
> >>
> >> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
> >> screen, or Poops for short :o)
> >
> >It does take over the screen.  80x50 isn't needed since it knows how
> >to scroll the kernel log buffer on 80x25.
>
> It's needed because scrolling back might be impossible (shift-up in panic
> = no-go), not because it knows how to scroll.

Oh, I see.  You are talking about the kernel message(s), not
kmsgdump.  Sorry, I switched to kmsgdump there somehow.
Yes, more info on the screen from the kernel would be good.

-- 
~Randy

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Console debugging wishlist was: Re: oops pauser.
  2006-01-05  4:52 oops pauser Dave Jones
                   ` (4 preceding siblings ...)
  2006-01-05 14:39 ` Kyle McMartin
@ 2006-01-09 18:43 ` Andi Kleen
  2006-01-10 20:25   ` Jan Engelhardt
  5 siblings, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2006-01-09 18:43 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-kernel

Dave Jones <davej@redhat.com> writes:

> In my quest to get better debug data from users in Fedora bug reports,
> I came up with this patch.  A majority of users don't have serial
> consoles, so when an oops scrolls off the top of the screen,
> and locks up, they usually end up reporting a 2nd (or later) oops
> that isn't particularly helpful (or worse, some inconsequential
> info like 'sleeping whilst atomic' warnings)

Ok - here's my personal wishlist. If someone is interested ...

What I would like to have is a "more" option for the kernel that makes
it page kernel output like "more" and asks you before scrolling
to the next page.

What would be also cool would be to fix the VGA console to have 
a larger scroll back buffer.  The standard kernel boot output 
is far larger than the default scrollback, so if you get a hang
late you have no way to look back to all the earlier 
messages.

(it is hard to understand that with 128MB+ graphic cards and 512+MB
computers the scroll back must be still so short...) 

And fixing sysrq to work after panics would be also nice.

And maybe a sysrq key to switch the font to the smallest one available
so as much as possible would fit onto a digital photo.
> 
> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes. Future patches could add some additional feedback
> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.

That's the killer issues why this patch is a bad idea.

-Andi

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-09 18:43 ` Console debugging wishlist was: " Andi Kleen
@ 2006-01-10 20:25   ` Jan Engelhardt
  2006-01-10 20:29     ` Josef Sipek
                       ` (2 more replies)
  0 siblings, 3 replies; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-10 20:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Dave Jones, linux-kernel

>Ok - here's my personal wishlist. If someone is interested ...
>
>What I would like to have is a "more" option for the kernel that makes
>it page kernel output like "more" and asks you before scrolling
>to the next page.

An oops is usually a condition you can recover from in some/most/depends 
cases (e.g. a null deref in a filesystem "only" makes that vfsmount 
(filesystem at all?) blocked), so if the kernel is waiting for user input 
on a non-panic condition, this means userspace stops too, which is not 
too good if the kernel is still 'alive'.
It's like we are entering kdb although everything is fine enough to go 
through a proper `init 6`.

>What would be also cool would be to fix the VGA console to have 
>a larger scroll back buffer.  The standard kernel boot output 
>is far larger than the default scrollback, so if you get a hang
>late you have no way to look back to all the earlier 
>messages.
>
>(it is hard to understand that with 128MB+ graphic cards and 512+MB
>computers the scroll back must be still so short...) 

I doubt this scrollback buffer is implemented as part of the video cards. 
It is rather a kernel invention, and therefore uses standard RAM. But the 
idea is good, preferably make it a CONFIG_ option.

>And fixing sysrq to work after panics would be also nice.

I am not sure, but would enabling interrupts be enough?

>And maybe a sysrq key to switch the font to the smallest one available
>so as much as possible would fit onto a digital photo.

And analog photos? ;)

>> The one case this doesn't catch is the problem of oopses whilst
>> in X. Previously a non-fatal oops would stall X momentarily,
>> and then things continue. Now those cases will lock up completely
>> for two minutes. Future patches could add some additional feedback
>> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.
>
>That's the killer issues why this patch is a bad idea.
>

Whilst few can be done in X situations, let's at least improve consoles.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 20:25   ` Jan Engelhardt
@ 2006-01-10 20:29     ` Josef Sipek
  2006-01-10 20:44       ` Jan Engelhardt
  2006-01-10 20:46       ` Andi Kleen
  2006-01-10 20:45     ` Andi Kleen
  2006-01-11 12:24     ` Antonino A. Daplas
  2 siblings, 2 replies; 75+ messages in thread
From: Josef Sipek @ 2006-01-10 20:29 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Andi Kleen, Dave Jones, linux-kernel

On Tue, Jan 10, 2006 at 09:25:46PM +0100, Jan Engelhardt wrote:
> >What would be also cool would be to fix the VGA console to have 
> >a larger scroll back buffer.  The standard kernel boot output 
> >is far larger than the default scrollback, so if you get a hang
> >late you have no way to look back to all the earlier 
> >messages.
> >
> >(it is hard to understand that with 128MB+ graphic cards and 512+MB
> >computers the scroll back must be still so short...) 
> 
> I doubt this scrollback buffer is implemented as part of the video cards. 
> It is rather a kernel invention, and therefore uses standard RAM. But the 
> idea is good, preferably make it a CONFIG_ option.

There is a config option that lets you specify the size of this buffer:
CONFIG_LOG_BUF_SHIFT

Jeff.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 20:29     ` Josef Sipek
@ 2006-01-10 20:44       ` Jan Engelhardt
  2006-01-10 22:54         ` Josef Sipek
  2006-01-10 20:46       ` Andi Kleen
  1 sibling, 1 reply; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-10 20:44 UTC (permalink / raw)
  To: Josef Sipek; +Cc: Andi Kleen, Dave Jones, linux-kernel


>> I doubt this scrollback buffer is implemented as part of the video cards. 
>> It is rather a kernel invention, and therefore uses standard RAM. But the 
>> idea is good, preferably make it a CONFIG_ option.
>
>There is a config option that lets you specify the size of this buffer:
>CONFIG_LOG_BUF_SHIFT

menuconfig help says

    "Select kernel log buffer size as a power of 2."

That does not sound like "console scroll buffer".



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 20:25   ` Jan Engelhardt
  2006-01-10 20:29     ` Josef Sipek
@ 2006-01-10 20:45     ` Andi Kleen
  2006-01-10 21:06       ` Jan Engelhardt
  2006-01-11 12:24     ` Antonino A. Daplas
  2 siblings, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2006-01-10 20:45 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Dave Jones, linux-kernel

On Tuesday 10 January 2006 21:25, Jan Engelhardt wrote:
> An oops is usually a condition you can recover from in some/most/depends 
> cases (e.g. a null deref in a filesystem "only" makes that vfsmount 
> (filesystem at all?) blocked), so if the kernel is waiting for user input 
> on a non-panic condition, this means userspace stops too, which is not 
> too good if the kernel is still 'alive'.
> It's like we are entering kdb although everything is fine enough to go 
> through a proper `init 6`.

-ENOPARSE

> 
> >What would be also cool would be to fix the VGA console to have 
> >a larger scroll back buffer.  The standard kernel boot output 
> >is far larger than the default scrollback, so if you get a hang
> >late you have no way to look back to all the earlier 
> >messages.
> >
> >(it is hard to understand that with 128MB+ graphic cards and 512+MB
> >computers the scroll back must be still so short...) 
> 
> I doubt this scrollback buffer is implemented as part of the video cards. 
> It is rather a kernel invention, and therefore uses standard RAM. But the 
> idea is good, preferably make it a CONFIG_ option.

At least long ago (when I last looked) it was in video RAM. 

> 
> >And fixing sysrq to work after panics would be also nice.
> 
> I am not sure, but would enabling interrupts be enough?

Interrupts are already enabled, but no - it's not.

Thank you for an useful contribution to the thread.

-Andi

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 20:29     ` Josef Sipek
  2006-01-10 20:44       ` Jan Engelhardt
@ 2006-01-10 20:46       ` Andi Kleen
  1 sibling, 0 replies; 75+ messages in thread
From: Andi Kleen @ 2006-01-10 20:46 UTC (permalink / raw)
  To: Josef Sipek; +Cc: Jan Engelhardt, Dave Jones, linux-kernel

On Tuesday 10 January 2006 21:29, Josef Sipek wrote:

> There is a config option that lets you specify the size of this buffer:
> CONFIG_LOG_BUF_SHIFT

That is the dmesg buffer, not the scroll back buffer. Completely different
things.

-Andi


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 20:45     ` Andi Kleen
@ 2006-01-10 21:06       ` Jan Engelhardt
  2006-01-10 21:18         ` Andi Kleen
  0 siblings, 1 reply; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-10 21:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Dave Jones, linux-kernel


>-ENOPARSE

Try the oops.ko from http://jengelh.hopto.org/f/oops_ko.tbz2. It won't kill
your system, you can continue to work.

If you now had a kernel-level pager that would jump in everytime an oops
happened, control would normally not be given back to userspace unless we quit
the pager. kdb has a similar behavior: it "stops" userspace until someone
chooses to "c"ontinue.
Therefore this pager would not be too good. In a panic, yes, it would be 
perfect.

I hope this makes it a little bit clearer, if not, -EAGAIN.

>> >(it is hard to understand that with 128MB+ graphic cards and 512+MB
>> >computers the scroll back must be still so short...) 
>> 
>> I doubt this scrollback buffer is implemented as part of the video cards. 
>> It is rather a kernel invention, and therefore uses standard RAM. But the 
>> idea is good, preferably make it a CONFIG_ option.
>
>At least long ago (when I last looked) it was in video RAM. 

Let's put it from another POV: if the scrollback buffer was somewhere within
the video card, it would usually not be cleared when you change from one
console tty to another. Currently, doing this switch clears the buffer (can we
do anything about that? - would be great)



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 21:06       ` Jan Engelhardt
@ 2006-01-10 21:18         ` Andi Kleen
  2006-01-10 21:30           ` Jan Engelhardt
  0 siblings, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2006-01-10 21:18 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Dave Jones, linux-kernel

On Tuesday 10 January 2006 22:06, Jan Engelhardt wrote:

> If you now had a kernel-level pager that would jump in everytime an oops
> happened, control would normally not be given back to userspace unless we quit
> the pager. kdb has a similar behavior: it "stops" userspace until someone
> chooses to "c"ontinue.
> Therefore this pager would not be too good. In a panic, yes, it would be 
> perfect.

First for an recoverable oops there is no reason you couldn't use
schedule_timeout(). And for those you don't need it anyways
because you can as well use dmesg. For others you can use poll loops.

But it wasn't actually my point. If you get 
an problem during bootup - not necessarily an oops, but could
be also a no root panic or your SCSI controller not working or 
something else - and you can reproduce it it's a PITA to examine
the kernel output before because there is no way to get
enough scrollback.  For the oops itself it's not needed - it typically
fits on the screen. But if it happens every boot it would be nice
if you could just boot with "more" and then page through
the kernel output and check what's going on.

The feature would be mainly useful for problems during kernel bootup,
although it might be sometimes useful too e.g. when user space
hangs, but you want to page through the hotkey process dump
which might be longer than console scrollback.

Just more scrollback does not necessarily replace this because
sometimes youe end up with so much output so quickly (e.g. some errors
are very verbose) that any scrollback buffer would be overflown.

Now the only issue would be to work out when to use schedule_timeout
and when to use a delay, but that can be all distingushed with some code.

Anyways mind you - i suspect actually implementing this would be somewhat
ugly, so the chances of it actually getting in would be likely slim.
Still it would be often useful.

-Andi


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 21:18         ` Andi Kleen
@ 2006-01-10 21:30           ` Jan Engelhardt
  0 siblings, 0 replies; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-10 21:30 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Dave Jones, linux-kernel


>But it wasn't actually my point. If you get 
>an problem during bootup - not necessarily an oops, but could
>be also a no root panic or your SCSI controller not working or 
>something else - and you can reproduce it it's a PITA to examine
>the kernel output before because there is no way to get
>enough scrollback.  For the oops itself it's not needed - it typically
>fits on the screen. But if it happens every boot it would be nice
>if you could just boot with "more" and then page through
>the kernel output and check what's going on.

Ah yes, I had not considered boot oopses/panics. My bad.



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 20:44       ` Jan Engelhardt
@ 2006-01-10 22:54         ` Josef Sipek
  0 siblings, 0 replies; 75+ messages in thread
From: Josef Sipek @ 2006-01-10 22:54 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Andi Kleen, Dave Jones, linux-kernel

On Tue, Jan 10, 2006 at 09:44:43PM +0100, Jan Engelhardt wrote:
> 
> >> I doubt this scrollback buffer is implemented as part of the video cards. 
> >> It is rather a kernel invention, and therefore uses standard RAM. But the 
> >> idea is good, preferably make it a CONFIG_ option.
> >
> >There is a config option that lets you specify the size of this buffer:
> >CONFIG_LOG_BUF_SHIFT
> 
> menuconfig help says
> 
>     "Select kernel log buffer size as a power of 2."
> 
> That does not sound like "console scroll buffer".

True. I should think more about what I say before I say it.

Jeff.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-10 20:25   ` Jan Engelhardt
  2006-01-10 20:29     ` Josef Sipek
  2006-01-10 20:45     ` Andi Kleen
@ 2006-01-11 12:24     ` Antonino A. Daplas
  2006-01-11 12:31       ` Andi Kleen
  2 siblings, 1 reply; 75+ messages in thread
From: Antonino A. Daplas @ 2006-01-11 12:24 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Andi Kleen, Dave Jones, linux-kernel

Jan Engelhardt wrote:
>> Ok - here's my personal wishlist. If someone is interested ...
>>
>> What I would like to have is a "more" option for the kernel that makes
>> it page kernel output like "more" and asks you before scrolling
>> to the next page.
> 
> An oops is usually a condition you can recover from in some/most/depends 
> cases (e.g. a null deref in a filesystem "only" makes that vfsmount 
> (filesystem at all?) blocked), so if the kernel is waiting for user input 
> on a non-panic condition, this means userspace stops too, which is not 
> too good if the kernel is still 'alive'.
> It's like we are entering kdb although everything is fine enough to go 
> through a proper `init 6`.
> 
>> What would be also cool would be to fix the VGA console to have 
>> a larger scroll back buffer.  The standard kernel boot output 
>> is far larger than the default scrollback, so if you get a hang
>> late you have no way to look back to all the earlier 
>> messages.
>>
>> (it is hard to understand that with 128MB+ graphic cards and 512+MB
>> computers the scroll back must be still so short...) 
> 
> I doubt this scrollback buffer is implemented as part of the video cards. 
> It is rather a kernel invention, and therefore uses standard RAM. But the 
> idea is good, preferably make it a CONFIG_ option.

In the VGA console, all buffers, including scrollback is in video RAM, but
the size is fixed and is very small.

With the framebuffer console, you can increase the size of the scrollback
buffer with the boot option:

fbcon=scrollback:<n> (default is 32K)

Tony

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-11 12:24     ` Antonino A. Daplas
@ 2006-01-11 12:31       ` Andi Kleen
  2006-01-11 13:05         ` Antonino A. Daplas
  0 siblings, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2006-01-11 12:31 UTC (permalink / raw)
  To: Antonino A. Daplas; +Cc: Jan Engelhardt, Dave Jones, linux-kernel

On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:

> In the VGA console, all buffers, including scrollback is in video RAM, but
> the size is fixed and is very small.

I wonder if that can be fixed.

> With the framebuffer console, you can increase the size of the scrollback
> buffer with the boot option:
> 
> fbcon=scrollback:<n> (default is 32K)

On x86-64 vesafb is unusable slow because it does CPU scrolling cause
it can't use the vesa BIOS - and the others don't work everywhere. So I don't
think fbcon is an usable replacement.

-Andi

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-11 12:31       ` Andi Kleen
@ 2006-01-11 13:05         ` Antonino A. Daplas
  2006-01-11 13:17           ` Andi Kleen
  2006-01-11 18:34           ` Jan Engelhardt
  0 siblings, 2 replies; 75+ messages in thread
From: Antonino A. Daplas @ 2006-01-11 13:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jan Engelhardt, Dave Jones, linux-kernel

Andi Kleen wrote:
> On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:
> 
>> In the VGA console, all buffers, including scrollback is in video RAM, but
>> the size is fixed and is very small.
> 
> I wonder if that can be fixed.

It can be done, but it will affect VGA console performance.
 
> 
>> With the framebuffer console, you can increase the size of the scrollback
>> buffer with the boot option:
>>
>> fbcon=scrollback:<n> (default is 32K)
> 
> On x86-64 vesafb is unusable slow because it does CPU scrolling cause
> it can't use the vesa BIOS - and the others don't work everywhere. So I don't
> think fbcon is an usable replacement.

How about vga16fb + fbcon? If scrolling is slow in vga16fb, fbset -vyres 800 should
increase performance significantly.

Tony

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-11 13:05         ` Antonino A. Daplas
@ 2006-01-11 13:17           ` Andi Kleen
  2006-01-11 13:43             ` Antonino A. Daplas
  2006-01-11 18:34           ` Jan Engelhardt
  1 sibling, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2006-01-11 13:17 UTC (permalink / raw)
  To: Antonino A. Daplas; +Cc: Jan Engelhardt, Dave Jones, linux-kernel

On Wednesday 11 January 2006 14:05, Antonino A. Daplas wrote:
> Andi Kleen wrote:
> > On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:
> > 
> >> In the VGA console, all buffers, including scrollback is in video RAM, but
> >> the size is fixed and is very small.
> > 
> > I wonder if that can be fixed.
> 
> It can be done, but it will affect VGA console performance.

By how much? As long as it still scrolls reasonably fast it would be ok for me.

>  
> > 
> >> With the framebuffer console, you can increase the size of the scrollback
> >> buffer with the boot option:
> >>
> >> fbcon=scrollback:<n> (default is 32K)
> > 
> > On x86-64 vesafb is unusable slow because it does CPU scrolling cause
> > it can't use the vesa BIOS - and the others don't work everywhere. So I don't
> > think fbcon is an usable replacement.
> 
> How about vga16fb + fbcon? If scrolling is slow in vga16fb, fbset -vyres 800 should
> increase performance significantly.

I can try it.

-Andi

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-11 13:17           ` Andi Kleen
@ 2006-01-11 13:43             ` Antonino A. Daplas
  2006-01-11 13:51               ` Andi Kleen
  0 siblings, 1 reply; 75+ messages in thread
From: Antonino A. Daplas @ 2006-01-11 13:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jan Engelhardt, Dave Jones, linux-kernel

Andi Kleen wrote:
> On Wednesday 11 January 2006 14:05, Antonino A. Daplas wrote:
>> Andi Kleen wrote:
>>> On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:
>>>
>>>> In the VGA console, all buffers, including scrollback is in video RAM, but
>>>> the size is fixed and is very small.
>>> I wonder if that can be fixed.
>> It can be done, but it will affect VGA console performance.
> 
> By how much? As long as it still scrolls reasonably fast it would be ok for me.

Each character will need to be written twice, one to VGA RAM and another to
the shadow/scrollback buffer in system RAM. It would still be reasonably fast.

Perhaps I can implement this for vgacon.

Tony

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-11 13:43             ` Antonino A. Daplas
@ 2006-01-11 13:51               ` Andi Kleen
  0 siblings, 0 replies; 75+ messages in thread
From: Andi Kleen @ 2006-01-11 13:51 UTC (permalink / raw)
  To: Antonino A. Daplas; +Cc: Jan Engelhardt, Dave Jones, linux-kernel

On Wednesday 11 January 2006 14:43, Antonino A. Daplas wrote:
> Andi Kleen wrote:
> > On Wednesday 11 January 2006 14:05, Antonino A. Daplas wrote:
> >> Andi Kleen wrote:
> >>> On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:
> >>>
> >>>> In the VGA console, all buffers, including scrollback is in video RAM, but
> >>>> the size is fixed and is very small.
> >>> I wonder if that can be fixed.
> >> It can be done, but it will affect VGA console performance.
> > 
> > By how much? As long as it still scrolls reasonably fast it would be ok for me.
> 
> Each character will need to be written twice, one to VGA RAM and another to
> the shadow/scrollback buffer in system RAM.

That should be basically unnoticeable. 

> It would still be reasonably fast. 
> 
> Perhaps I can implement this for vgacon.

Please do. And increase the default scrollback please or make it a CONFIG.

Thanks,
-Andi

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-11 13:05         ` Antonino A. Daplas
  2006-01-11 13:17           ` Andi Kleen
@ 2006-01-11 18:34           ` Jan Engelhardt
  1 sibling, 0 replies; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-11 18:34 UTC (permalink / raw)
  To: Antonino A. Daplas; +Cc: Andi Kleen, Dave Jones, linux-kernel

>>> With the framebuffer console, you can increase the size of the scrollback
>>> buffer with the boot option:
>>>
>>> fbcon=scrollback:<n> (default is 32K)
>> 
>> On x86-64 vesafb is unusable slow because it does CPU scrolling cause
>> it can't use the vesa BIOS - and the others don't work everywhere. So I don't
>> think fbcon is an usable replacement.
>
>How about vga16fb + fbcon? If scrolling is slow in vga16fb, fbset -vyres 800 should
>increase performance significantly.
>

Benchmarks first.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-15 17:13     ` Andi Kleen
@ 2006-01-15 20:51       ` Jan Engelhardt
  0 siblings, 0 replies; 75+ messages in thread
From: Jan Engelhardt @ 2006-01-15 20:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: 7eggert, Dave Jones, linux-kernel

>> > (it is hard to understand that with 128MB+ graphic cards and 512+MB
>> > computers the scroll back must be still so short...)
>> 
>> The VGA scrollback buffer is limited by the text area of the video RAM.
>> The text area is in the DOS memory at 0xB800 (or 0xB000) and extends
>> 32 KB (or in case of MDA, 4 KB). Each character will use 2 Bytes.
>> Therefore you can store up to 16,000 characters or 4 pages of text.
>
>It was a rhetorical question.
>
And I assumed that scrollback was stored in some regular kmalloc()ed page(s).


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
  2006-01-15 16:48   ` Bodo Eggert
@ 2006-01-15 17:13     ` Andi Kleen
  2006-01-15 20:51       ` Jan Engelhardt
  0 siblings, 1 reply; 75+ messages in thread
From: Andi Kleen @ 2006-01-15 17:13 UTC (permalink / raw)
  To: 7eggert; +Cc: Dave Jones, linux-kernel

On Sunday 15 January 2006 17:48, Bodo Eggert wrote:
> Andi Kleen <ak@suse.de> wrote:
> 
> > (it is hard to understand that with 128MB+ graphic cards and 512+MB
> > computers the scroll back must be still so short...)
> 
> The VGA scrollback buffer is limited by the text area of the video RAM.
> The text area is in the DOS memory at 0xB800 (or 0xB000) and extends
> 32 KB (or in case of MDA, 4 KB). Each character will use 2 Bytes.
> Therefore you can store up to 16,000 characters or 4 pages of text.

It was a rhetorical question.

-Andi


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: Console debugging wishlist was: Re: oops pauser.
       [not found] ` <5tagc-6AZ-25@gated-at.bofh.it>
@ 2006-01-15 16:48   ` Bodo Eggert
  2006-01-15 17:13     ` Andi Kleen
  0 siblings, 1 reply; 75+ messages in thread
From: Bodo Eggert @ 2006-01-15 16:48 UTC (permalink / raw)
  To: Andi Kleen, Dave Jones, linux-kernel

Andi Kleen <ak@suse.de> wrote:

> (it is hard to understand that with 128MB+ graphic cards and 512+MB
> computers the scroll back must be still so short...)

The VGA scrollback buffer is limited by the text area of the video RAM.
The text area is in the DOS memory at 0xB800 (or 0xB000) and extends
32 KB (or in case of MDA, 4 KB). Each character will use 2 Bytes.
Therefore you can store up to 16,000 characters or 4 pages of text.

-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2006-01-15 20:51 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-05  4:52 oops pauser Dave Jones
2006-01-05  6:10 ` oops pauser. / boot_delayer Randy.Dunlap
2006-01-05  7:30   ` Bernd Eckenfels
2006-01-05  8:07     ` Jan Engelhardt
2006-01-06  1:28       ` David Lang
2006-01-06  5:36         ` Dave Jones
2006-01-06  7:00           ` David Lang
2006-01-08 13:21           ` Pavel Machek
2006-01-08 19:30             ` Josef Sipek
2006-01-08 23:08               ` Pavel Machek
2006-01-08 23:39                 ` Josef Sipek
2006-01-06  7:36         ` Jan Engelhardt
2006-01-06  8:33           ` David Lang
2006-01-05  9:25     ` Grant Coady
2006-01-05 15:31       ` Mark Lord
2006-01-05 15:38         ` Avishay Traeger
2006-01-05 19:15           ` Mark Lord
2006-01-05 11:11     ` Dave Jones
2006-01-07 21:44       ` Kurtis D. Rader
2006-01-07 21:48         ` Arjan van de Ven
2006-01-07 22:00           ` Kurtis D. Rader
2006-01-08 23:29           ` David Lang
2006-01-07 22:27         ` Bernd Eckenfels
2006-01-05  8:15 ` oops pauser Jan Engelhardt
2006-01-05 10:33   ` Dave Jones
2006-01-05 11:05     ` Jan Engelhardt
2006-01-05 12:05       ` Keith Owens
2006-01-05 15:17       ` Jesper Juhl
2006-01-05 13:46     ` Kurt Wall
2006-01-06  1:24     ` David Lang
2006-01-06  1:41       ` Josef Sipek
2006-01-08 13:38     ` Ville Herva
2006-01-08 13:53       ` Randy.Dunlap
2006-01-08 19:35         ` Jan Engelhardt
2006-01-09  1:43           ` Randy.Dunlap
2006-01-08 19:40         ` Grant Coady
2006-01-09  1:45           ` Randy.Dunlap
2006-01-09 16:15             ` Jan Engelhardt
2006-01-09 16:25               ` Ville Herva
2006-01-09 16:39               ` Randy.Dunlap
2006-01-05 13:37 ` Alan Cox
2006-01-05 20:52   ` Dave Jones
2006-01-06 13:31     ` Alan Cox
2006-01-06 20:33       ` Dave Jones
2006-01-06 15:22     ` Pavel Machek
2006-01-06 19:06       ` Jan Engelhardt
2006-01-06 22:34         ` Pavel Machek
2006-01-06 22:48       ` Dave Jones
2006-01-05 13:58 ` Avishay Traeger
2006-01-05 20:54   ` Dave Jones
2006-01-06  0:19   ` Josef Sipek
2006-01-06  1:12     ` Bernd Eckenfels
2006-01-06  1:35       ` Josef Sipek
2006-01-06  2:21         ` Bernd Eckenfels
2006-01-05 14:39 ` Kyle McMartin
2006-01-09 18:43 ` Console debugging wishlist was: " Andi Kleen
2006-01-10 20:25   ` Jan Engelhardt
2006-01-10 20:29     ` Josef Sipek
2006-01-10 20:44       ` Jan Engelhardt
2006-01-10 22:54         ` Josef Sipek
2006-01-10 20:46       ` Andi Kleen
2006-01-10 20:45     ` Andi Kleen
2006-01-10 21:06       ` Jan Engelhardt
2006-01-10 21:18         ` Andi Kleen
2006-01-10 21:30           ` Jan Engelhardt
2006-01-11 12:24     ` Antonino A. Daplas
2006-01-11 12:31       ` Andi Kleen
2006-01-11 13:05         ` Antonino A. Daplas
2006-01-11 13:17           ` Andi Kleen
2006-01-11 13:43             ` Antonino A. Daplas
2006-01-11 13:51               ` Andi Kleen
2006-01-11 18:34           ` Jan Engelhardt
     [not found] <5rvok-5Sr-1@gated-at.bofh.it>
     [not found] ` <5tagc-6AZ-25@gated-at.bofh.it>
2006-01-15 16:48   ` Bodo Eggert
2006-01-15 17:13     ` Andi Kleen
2006-01-15 20:51       ` Jan Engelhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).