* Re: oops pauser.
2006-01-05 4:52 Dave Jones
@ 2006-01-05 8:15 ` Jan Engelhardt
2006-01-05 10:33 ` Dave Jones
2006-01-05 13:37 ` Alan Cox
` (2 subsequent siblings)
3 siblings, 1 reply; 40+ messages in thread
From: Jan Engelhardt @ 2006-01-05 8:15 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-kernel
>In my quest to get better debug data from users in Fedora bug reports,
>I came up with this patch. A majority of users don't have serial
>consoles, so when an oops scrolls off the top of the screen,
>and locks up, they usually end up reporting a 2nd (or later) oops
>that isn't particularly helpful (or worse, some inconsequential
>info like 'sleeping whilst atomic' warnings)
Here's something interesting too:
Sometimes, an oops is even longer than 25 rows, and the usual user
does not have
- VGA mode with a lot of lines (because it's hard to read)
- FB mode with a lot of lines (slow, and it's also hard to read)
Is it be possible to change the VGA mode to 80x43/80x50/80x60
during protected mode?
>With this patch, if we oops, there's a pause for a two minutes..
>which hopefully gives people enough time to grab a digital camera
>to take a screenshot of the oops.
>
It would be ideal to have something like BSD's "dump to predefined
block device on oops", so extraction of oops logs requires neither
pen-and-paper nor a digital camera. Requires another partition that
can be used for it, though.
>The one case this doesn't catch is the problem of oopses whilst
>in X. Previously a non-fatal oops would stall X momentarily,
>and then things continue. Now those cases will lock up completely
>for two minutes. Future patches could add some additional feedback
>during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.
>
Oh yes, include Stas Sergeev's PCSP patch and play a WAV telling "your box
just crashed, wait two minutes for uh ... an oops you can't grab
either"(*).
(*) If the oops is longer than 25 lines, ... you can't even use scrollback
because scrollback is cleared when you change consoles. X runs by default
on tty7, and the kernel dumps it somewhere else. (And even if it dumped to
tty7 directly, you would not see it.)
Jan Engelhardt
--
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 8:15 ` Jan Engelhardt
@ 2006-01-05 10:33 ` Dave Jones
2006-01-05 11:05 ` Jan Engelhardt
` (3 more replies)
0 siblings, 4 replies; 40+ messages in thread
From: Dave Jones @ 2006-01-05 10:33 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: linux-kernel
On Thu, Jan 05, 2006 at 09:15:02AM +0100, Jan Engelhardt wrote:
> Here's something interesting too:
> Sometimes, an oops is even longer than 25 rows, and the usual user
> does not have
> - VGA mode with a lot of lines (because it's hard to read)
> - FB mode with a lot of lines (slow, and it's also hard to read)
See the other patch I sent which halves the amount of lines needed
for a backtrace on i386 (like x86-64 uses). This helps too.
> Is it be possible to change the VGA mode to 80x43/80x50/80x60
> during protected mode?
After an oops, we can't really rely on anything. What if the
oops came from the console layer, or a framebuffer driver?
> >With this patch, if we oops, there's a pause for a two minutes..
> >which hopefully gives people enough time to grab a digital camera
> >to take a screenshot of the oops.
> >
> It would be ideal to have something like BSD's "dump to predefined
> block device on oops", so extraction of oops logs requires neither
> pen-and-paper nor a digital camera. Requires another partition that
> can be used for it, though.
I dislike most of the disk dump patches that I've seen out there
because most of them rely on the system being in a decent enough
state to be able to write out blocks of data.
If I had any faith in the sturdyness of the floppy driver, I'd
recommend someone looked into a 'dump oops to floppy' patch, but
it too relies on a large part of the system being in a sane
enough state to write blocks out to disk.
> (*) If the oops is longer than 25 lines, ... you can't even use scrollback
> because scrollback is cleared when you change consoles. X runs by default
> on tty7, and the kernel dumps it somewhere else. (And even if it dumped to
> tty7 directly, you would not see it.)
What to do about oopses whilst in X has been the subject of much
head-scratching for years now. It's come up at least at the
last two kernel summits, and I'll hazard a guess it'll come up
again this year. The amount of work necessary to make it all
work on both kernel side and X side isn't unsubstantial however,
so I wouldn't count on it working too soon.
Hmm, SuSE/Novell folks, doesn't NKLD take over an X display?
ISTR during a demo at last years OLS the presenter was flipping
in/out of the debugger between slides. Is there anything
useful there ?
Dave
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 10:33 ` Dave Jones
@ 2006-01-05 11:05 ` Jan Engelhardt
2006-01-05 12:05 ` Keith Owens
2006-01-05 15:17 ` Jesper Juhl
2006-01-05 13:46 ` Kurt Wall
` (2 subsequent siblings)
3 siblings, 2 replies; 40+ messages in thread
From: Jan Engelhardt @ 2006-01-05 11:05 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-kernel
>See the other patch I sent which halves the amount of lines needed
>for a backtrace on i386 (like x86-64 uses). This helps too.
>
.oO( Compress the oops, encode it base64 and display that instead )Oo. :-)
> > Is it be possible to change the VGA mode to 80x43/80x50/80x60
> > during protected mode?
>
>After an oops, we can't really rely on anything. What if the
>oops came from the console layer, or a framebuffer driver?
>
Well, setting the video mode can be done (on x86, ugh) with a BIOS call, so
we would not need to run through oops-affected code. But that was the
question, if this int 0x10 call was possible at all. Think of VBE -
VBE3 is the first version that can be done in protected mode.
>If I had any faith in the sturdyness of the floppy driver, I'd
>recommend someone looked into a 'dump oops to floppy' patch, but
>it too relies on a large part of the system being in a sane
>enough state to write blocks out to disk.
>
Right, sad world. (With fun I await the day someone writes a morse encoder
that writes oops to keyboard leds.)
Jan Engelhardt
--
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 11:05 ` Jan Engelhardt
@ 2006-01-05 12:05 ` Keith Owens
2006-01-05 15:17 ` Jesper Juhl
1 sibling, 0 replies; 40+ messages in thread
From: Keith Owens @ 2006-01-05 12:05 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: Dave Jones, linux-kernel
Jan Engelhardt (on Thu, 5 Jan 2006 12:05:08 +0100 (MET)) wrote:
>>
>Right, sad world. (With fun I await the day someone writes a morse encoder
>that writes oops to keyboard leds.)
It's already been done, both leds and PC speaker. http://kerneltrap.org/node/575/2355
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 11:05 ` Jan Engelhardt
2006-01-05 12:05 ` Keith Owens
@ 2006-01-05 15:17 ` Jesper Juhl
1 sibling, 0 replies; 40+ messages in thread
From: Jesper Juhl @ 2006-01-05 15:17 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: Dave Jones, linux-kernel
On 1/5/06, Jan Engelhardt <jengelh@linux01.gwdg.de> wrote:
>
> >See the other patch I sent which halves the amount of lines needed
> >for a backtrace on i386 (like x86-64 uses). This helps too.
> >
> .oO( Compress the oops, encode it base64 and display that instead )Oo. :-)
>
Not really something we want to do at Oops time and even if the kernel
was in a sane enough state to actually do it you've just increased the
amount of work needing to be done to decode the Oops by everyone
recieving/wanting to read it.
I think a better idea is to try and move things around so the most
useful pieces of information are on the last lines of the Oops output
(most likely to not have scrolled off the screen) and also work to
elliminate lines that are not really useful/helpful and maybe try to
cram more info from multiple short lines into a single line.
--
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 10:33 ` Dave Jones
2006-01-05 11:05 ` Jan Engelhardt
@ 2006-01-05 13:46 ` Kurt Wall
2006-01-06 1:24 ` David Lang
2006-01-08 13:38 ` Ville Herva
3 siblings, 0 replies; 40+ messages in thread
From: Kurt Wall @ 2006-01-05 13:46 UTC (permalink / raw)
To: Dave Jones, Jan Engelhardt, linux-kernel
On Thu, Jan 05, 2006 at 05:33:39AM -0500, Dave Jones took 0 lines to write:
>
> If I had any faith in the sturdyness of the floppy driver, I'd
> recommend someone looked into a 'dump oops to floppy' patch, but
> it too relies on a large part of the system being in a sane
> enough state to write blocks out to disk.
Not to mention that an increasing number of systems ship without a
floppy drive.
Kurt
--
If you perceive that there are four possible ways in which a procedure
can go wrong, and circumvent these, then a fifth way will promptly
develop.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 10:33 ` Dave Jones
2006-01-05 11:05 ` Jan Engelhardt
2006-01-05 13:46 ` Kurt Wall
@ 2006-01-06 1:24 ` David Lang
2006-01-06 1:41 ` Josef Sipek
2006-01-08 13:38 ` Ville Herva
3 siblings, 1 reply; 40+ messages in thread
From: David Lang @ 2006-01-06 1:24 UTC (permalink / raw)
To: Dave Jones; +Cc: Jan Engelhardt, linux-kernel
On Thu, 5 Jan 2006, Dave Jones wrote:
> > (*) If the oops is longer than 25 lines, ... you can't even use scrollback
> > because scrollback is cleared when you change consoles. X runs by default
> > on tty7, and the kernel dumps it somewhere else. (And even if it dumped to
> > tty7 directly, you would not see it.)
>
> What to do about oopses whilst in X has been the subject of much
> head-scratching for years now. It's come up at least at the
> last two kernel summits, and I'll hazard a guess it'll come up
> again this year. The amount of work necessary to make it all
> work on both kernel side and X side isn't unsubstantial however,
> so I wouldn't count on it working too soon.
hmm, if you can hope that someone will grab a camera to report an oops,
how about them grabbing a tape recorder/mp3 recorder to record audio from
the speaker. it's not fast, but you don't have that much data to output,
do it in morse (with the audio explination of what's going to happen
first)
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-06 1:24 ` David Lang
@ 2006-01-06 1:41 ` Josef Sipek
0 siblings, 0 replies; 40+ messages in thread
From: Josef Sipek @ 2006-01-06 1:41 UTC (permalink / raw)
To: David Lang; +Cc: Dave Jones, Jan Engelhardt, linux-kernel
On Thu, Jan 05, 2006 at 05:24:01PM -0800, David Lang wrote:
> On Thu, 5 Jan 2006, Dave Jones wrote:
>
> >> (*) If the oops is longer than 25 lines, ... you can't even use
> >scrollback
> >> because scrollback is cleared when you change consoles. X runs by default
> >> on tty7, and the kernel dumps it somewhere else. (And even if it dumped
> >to
> >> tty7 directly, you would not see it.)
> >
> >What to do about oopses whilst in X has been the subject of much
> >head-scratching for years now. It's come up at least at the
> >last two kernel summits, and I'll hazard a guess it'll come up
> >again this year. The amount of work necessary to make it all
> >work on both kernel side and X side isn't unsubstantial however,
> >so I wouldn't count on it working too soon.
>
> hmm, if you can hope that someone will grab a camera to report an oops,
> how about them grabbing a tape recorder/mp3 recorder to record audio from
> the speaker. it's not fast, but you don't have that much data to output,
> do it in morse (with the audio explination of what's going to happen
> first)
There is a patch somewhere that uses the keyboard lights to "display" panics,
and a comment that the PC speaker implementation is left up to the reader :)
It shouldn't be hard do, then all you need is just one printk telling the user
to record it :)
Jeff.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 10:33 ` Dave Jones
` (2 preceding siblings ...)
2006-01-06 1:24 ` David Lang
@ 2006-01-08 13:38 ` Ville Herva
2006-01-08 13:53 ` Randy.Dunlap
3 siblings, 1 reply; 40+ messages in thread
From: Ville Herva @ 2006-01-08 13:38 UTC (permalink / raw)
To: linux-kernel
On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
>
> If I had any faith in the sturdyness of the floppy driver, I'd
> recommend someone looked into a 'dump oops to floppy' patch, but
> it too relies on a large part of the system being in a sane
> enough state to write blocks out to disk.
I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
minimal 16-bit floppy driver to save the oops dump.
Kmsgdump has been around for ages and still works with 2.6.x. I almost
always use it (all of my boxes still have floppy drives.)
-- v --
v@iki.fi
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-08 13:38 ` Ville Herva
@ 2006-01-08 13:53 ` Randy.Dunlap
2006-01-08 19:35 ` Jan Engelhardt
2006-01-08 19:40 ` Grant Coady
0 siblings, 2 replies; 40+ messages in thread
From: Randy.Dunlap @ 2006-01-08 13:53 UTC (permalink / raw)
To: vherva; +Cc: linux-kernel
On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:
> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
> >
> > If I had any faith in the sturdyness of the floppy driver, I'd
> > recommend someone looked into a 'dump oops to floppy' patch, but
> > it too relies on a large part of the system being in a sane
> > enough state to write blocks out to disk.
>
> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> minimal 16-bit floppy driver to save the oops dump.
It just switches to real mode and uses BIOS calls.
> Kmsgdump has been around for ages and still works with 2.6.x. I almost
> always use it (all of my boxes still have floppy drives.)
---
~Randy
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-08 13:53 ` Randy.Dunlap
@ 2006-01-08 19:35 ` Jan Engelhardt
2006-01-09 1:43 ` Randy.Dunlap
2006-01-08 19:40 ` Grant Coady
1 sibling, 1 reply; 40+ messages in thread
From: Jan Engelhardt @ 2006-01-08 19:35 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: vherva, linux-kernel
>> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
>> minimal 16-bit floppy driver to save the oops dump.
>
>It just switches to real mode and uses BIOS calls.
>
This technique btw is what I suggested (switch to 80x50 vga mode
(if not in X)) in case of a longer oops trace.
Jan Engelhardt
--
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-08 19:35 ` Jan Engelhardt
@ 2006-01-09 1:43 ` Randy.Dunlap
0 siblings, 0 replies; 40+ messages in thread
From: Randy.Dunlap @ 2006-01-09 1:43 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: vherva, linux-kernel
On Sun, 8 Jan 2006 20:35:08 +0100 (MET) Jan Engelhardt wrote:
> >> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> >> minimal 16-bit floppy driver to save the oops dump.
> >
> >It just switches to real mode and uses BIOS calls.
> >
>
> This technique btw is what I suggested (switch to 80x50 vga mode
> (if not in X)) in case of a longer oops trace.
kmsgdump already shows all of the kernel log buffer that is in
memory (has not been written to disk, basically).
If I (or we) had some time and motivation, I have a
contributed patch to kmsgdump that:
a. saves and dumps all of the kernel log buffer
(reminder: current dump targets are display, parallel port
printer, and legacy floppy disk)
b. adds a hard disk dump target and attempts to make this safe
by pre-reserving and writing each block of it with a
signature + block number (and maybe more, I'm not sure
right now)
c. add x86-64 support
but I have not merged this code into kmsgdump yet, nor have
I even tested it. I can't test the x86-64 support since I
don't (yet) have an x86-64 system available for this.
If anyone wants to work on this, I'll put the additional
code on the web.
---
~Randy
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-08 13:53 ` Randy.Dunlap
2006-01-08 19:35 ` Jan Engelhardt
@ 2006-01-08 19:40 ` Grant Coady
2006-01-09 1:45 ` Randy.Dunlap
1 sibling, 1 reply; 40+ messages in thread
From: Grant Coady @ 2006-01-08 19:40 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: vherva, linux-kernel
On Sun, 8 Jan 2006 05:53:22 -0800, "Randy.Dunlap" <rdunlap@xenotime.net> wrote:
>On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:
>
>> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
>> >
>> > If I had any faith in the sturdyness of the floppy driver, I'd
>> > recommend someone looked into a 'dump oops to floppy' patch, but
>> > it too relies on a large part of the system being in a sane
>> > enough state to write blocks out to disk.
>>
>> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
>> minimal 16-bit floppy driver to save the oops dump.
>
>It just switches to real mode and uses BIOS calls.
So would it be viable to take over the screen in similar fashion?
Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
screen, or Poops for short :o)
Grant.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-08 19:40 ` Grant Coady
@ 2006-01-09 1:45 ` Randy.Dunlap
2006-01-09 16:15 ` Jan Engelhardt
0 siblings, 1 reply; 40+ messages in thread
From: Randy.Dunlap @ 2006-01-09 1:45 UTC (permalink / raw)
To: gcoady; +Cc: vherva, linux-kernel
On Mon, 09 Jan 2006 06:40:57 +1100 Grant Coady wrote:
> On Sun, 8 Jan 2006 05:53:22 -0800, "Randy.Dunlap" <rdunlap@xenotime.net> wrote:
>
> >On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:
> >
> >> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
> >> >
> >> > If I had any faith in the sturdyness of the floppy driver, I'd
> >> > recommend someone looked into a 'dump oops to floppy' patch, but
> >> > it too relies on a large part of the system being in a sane
> >> > enough state to write blocks out to disk.
> >>
> >> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> >> minimal 16-bit floppy driver to save the oops dump.
> >
> >It just switches to real mode and uses BIOS calls.
>
> So would it be viable to take over the screen in similar fashion?
>
> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
> screen, or Poops for short :o)
It does take over the screen. 80x50 isn't needed since it knows how
to scroll the kernel log buffer on 80x25.
---
~Randy
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-09 1:45 ` Randy.Dunlap
@ 2006-01-09 16:15 ` Jan Engelhardt
2006-01-09 16:25 ` Ville Herva
2006-01-09 16:39 ` Randy.Dunlap
0 siblings, 2 replies; 40+ messages in thread
From: Jan Engelhardt @ 2006-01-09 16:15 UTC (permalink / raw)
To: Randy.Dunlap; +Cc: gcoady, vherva, linux-kernel
>> So would it be viable to take over the screen in similar fashion?
>>
>> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
>> screen, or Poops for short :o)
>
>It does take over the screen. 80x50 isn't needed since it knows how
>to scroll the kernel log buffer on 80x25.
It's needed because scrolling back might be impossible (shift-up in panic
= no-go), not because it knows how to scroll.
Jan Engelhardt
--
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-09 16:15 ` Jan Engelhardt
@ 2006-01-09 16:25 ` Ville Herva
2006-01-09 16:39 ` Randy.Dunlap
1 sibling, 0 replies; 40+ messages in thread
From: Ville Herva @ 2006-01-09 16:25 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: Randy.Dunlap, gcoady, linux-kernel
On Mon, Jan 09, 2006 at 05:15:55PM +0100, you [Jan Engelhardt] wrote:
> >> So would it be viable to take over the screen in similar fashion?
> >>
> >> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
> >> screen, or Poops for short :o)
> >
> >It does take over the screen. 80x50 isn't needed since it knows how
> >to scroll the kernel log buffer on 80x25.
>
> It's needed because scrolling back might be impossible (shift-up in panic
> = no-go), not because it knows how to scroll.
Please try kmsgdump.
It has its own real-mode terminal (with scrolling) to which it switches on
oops. Hung kernel console doesn't affect it.
-- v --
v@iki.fi
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-09 16:15 ` Jan Engelhardt
2006-01-09 16:25 ` Ville Herva
@ 2006-01-09 16:39 ` Randy.Dunlap
1 sibling, 0 replies; 40+ messages in thread
From: Randy.Dunlap @ 2006-01-09 16:39 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: Randy.Dunlap, gcoady, vherva, linux-kernel
On Mon, 9 Jan 2006, Jan Engelhardt wrote:
> >> So would it be viable to take over the screen in similar fashion?
> >>
> >> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
> >> screen, or Poops for short :o)
> >
> >It does take over the screen. 80x50 isn't needed since it knows how
> >to scroll the kernel log buffer on 80x25.
>
> It's needed because scrolling back might be impossible (shift-up in panic
> = no-go), not because it knows how to scroll.
Oh, I see. You are talking about the kernel message(s), not
kmsgdump. Sorry, I switched to kmsgdump there somehow.
Yes, more info on the screen from the kernel would be good.
--
~Randy
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 4:52 Dave Jones
2006-01-05 8:15 ` Jan Engelhardt
@ 2006-01-05 13:37 ` Alan Cox
2006-01-05 20:52 ` Dave Jones
2006-01-05 13:58 ` Avishay Traeger
2006-01-05 14:39 ` Kyle McMartin
3 siblings, 1 reply; 40+ messages in thread
From: Alan Cox @ 2006-01-05 13:37 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-kernel
On Mer, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
> With this patch, if we oops, there's a pause for a two minutes..
> which hopefully gives people enough time to grab a digital camera
> to take a screenshot of the oops.
This appears to reduce the amount of information available as an oops
instead of spewing to the log and continuing generally will hang the box
stopping the scroll keys being used or dmesg being used to get the data
out.
Who is going to wait two minutes for an oops when for most users its
their only box. Instead of pasting reports people will now reboot, or
perhaps send you the half a report they can see (which because we dump
too much info by default to fit the screen is also useless).
> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes.
The console has awareness of graphic/text mode at all times and knows
what is going on. Why not use that information if you must go this way ?
Alan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 13:37 ` Alan Cox
@ 2006-01-05 20:52 ` Dave Jones
2006-01-06 13:31 ` Alan Cox
2006-01-06 15:22 ` Pavel Machek
0 siblings, 2 replies; 40+ messages in thread
From: Dave Jones @ 2006-01-05 20:52 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel
On Thu, Jan 05, 2006 at 01:37:33PM +0000, Alan Cox wrote:
> On Mer, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
> > With this patch, if we oops, there's a pause for a two minutes..
> > which hopefully gives people enough time to grab a digital camera
> > to take a screenshot of the oops.
>
> This appears to reduce the amount of information available as an oops
> instead of spewing to the log
The huge number of oopses never hit the logs.
They either hit early in boot before syslog is even running, or
they kill the box.
> and continuing generally will hang the box
> stopping the scroll keys being used or dmesg being used to get the data
> out.
This is exactly the problem this patch addresses.
The 'scroll keys' do not work in cases where we lock up after an oops.
If the useful parts of the oops scrolled off the top of the screen, we've
lost any chance of debugging whatever just happened.
> Who is going to wait two minutes for an oops when for most users its
> their only box.
The real-world disagrees with you. In the few weeks it's been in Fedora,
several previously undiagnosable oopses were caught, and even *users*
agreed it was a useful addition. If the two minutes is excessive, we can
lower it, or even make it a boot-option.
Another possibility is instantly continuing after a keypress.
> Instead of pasting reports people will now reboot, or
> perhaps send you the half a report they can see (which because we dump
> too much info by default to fit the screen is also useless).
See the other patch which halves the number of lines needed for a backtrace.
With that, even if the user is running 25 line high displays, we've
a pretty good chance it'll fit except for really long backtraces,
and if that's the case, we can ask users to try to reproduce after
booting with vga=1, (or better, vga=791 for eg).
> > The one case this doesn't catch is the problem of oopses whilst
> > in X. Previously a non-fatal oops would stall X momentarily,
> > and then things continue. Now those cases will lock up completely
> > for two minutes.
>
> The console has awareness of graphic/text mode at all times and knows
> what is going on. Why not use that information if you must go this way ?
If we've just oopsed, the console may have no awareness of what day it is,
yet alone anything about video modes. I'm not entirely sure what you're
suggesting, but it gives me the creeps. Are you talking about switching
away from X back to a tty when we oops?
Dave
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 20:52 ` Dave Jones
@ 2006-01-06 13:31 ` Alan Cox
2006-01-06 20:33 ` Dave Jones
2006-01-06 15:22 ` Pavel Machek
1 sibling, 1 reply; 40+ messages in thread
From: Alan Cox @ 2006-01-06 13:31 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-kernel
On Iau, 2006-01-05 at 15:52 -0500, Dave Jones wrote:
> The huge number of oopses never hit the logs.
> They either hit early in boot before syslog is even running, or
> they kill the box.
So you don't need a two minute delay for those because as you said it
froze the box
>
> > and continuing generally will hang the box
> > stopping the scroll keys being used or dmesg being used to get the data
> > out.
>
> This is exactly the problem this patch addresses.
> The 'scroll keys' do not work in cases where we lock up after an oops.
And in those cases the 2 minute freeze is meaningless
> The real-world disagrees with you. In the few weeks it's been in Fedora,
> several previously undiagnosable oopses were caught, and even *users*
> agreed it was a useful addition. If the two minutes is excessive, we can
> lower it, or even make it a boot-option.
Any change will capture different oopses. A boot option isnt a bad idea,
or for that matter also truncating the call trace to the *top* few (or
as Bryce suggested on irc reversing the printing order)
> Another possibility is instantly continuing after a keypress.
If the input layer is running that would be sensible.
> > The console has awareness of graphic/text mode at all times and knows
> > what is going on. Why not use that information if you must go this way ?
>
> If we've just oopsed, the console may have no awareness of what day it is,
> yet alone anything about video modes. I'm not entirely sure what you're
> suggesting, but it gives me the creeps. Are you talking about switching
> away from X back to a tty when we oops?
Well you could try and do that but I was more thinking that if the
console has been told we are in graphics mode then the 2 minute delay
shouldn't occur.
Alan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-06 13:31 ` Alan Cox
@ 2006-01-06 20:33 ` Dave Jones
0 siblings, 0 replies; 40+ messages in thread
From: Dave Jones @ 2006-01-06 20:33 UTC (permalink / raw)
To: Alan Cox; +Cc: linux-kernel
On Fri, Jan 06, 2006 at 01:31:10PM +0000, Alan Cox wrote:
> On Iau, 2006-01-05 at 15:52 -0500, Dave Jones wrote:
> > The huge number of oopses never hit the logs.
> > They either hit early in boot before syslog is even running, or
> > they kill the box.
>
> So you don't need a two minute delay for those because as you said it
> froze the box
it froze *AFTER* the oops had scrolled off the top of the screen.
The sequence of events before
oops
scrolly scrolly
random crap about sleeping whilst atomic or the like
scrolly scrolly
HANG
with this patch..
oops
*pause for two minutes whilst user takes a picture/scribbles it down*
scrolly scrolly
random crap about sleeping whilst atomic or the like
scrolly scrolly
HANG
> > > and continuing generally will hang the box
> > > stopping the scroll keys being used or dmesg being used to get the data
> > > out.
> >
> > This is exactly the problem this patch addresses.
> > The 'scroll keys' do not work in cases where we lock up after an oops.
>
> And in those cases the 2 minute freeze is meaningless
it does if it stops the oops scrolling off the screen first long enough
to capture it.
> > Another possibility is instantly continuing after a keypress.
> If the input layer is running that would be sensible.
Yeah, questionable. And polling hardware won't work due to usb keyboards.
> > If we've just oopsed, the console may have no awareness of what day it is,
> > yet alone anything about video modes. I'm not entirely sure what you're
> > suggesting, but it gives me the creeps. Are you talking about switching
> > away from X back to a tty when we oops?
>
> Well you could try and do that but I was more thinking that if the
> console has been told we are in graphics mode then the 2 minute delay
> shouldn't occur.
Hmm. I'll look into that.
Any pointers ? (I don't want to spend longer than necessary looking
in that code :-)
Dave
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 20:52 ` Dave Jones
2006-01-06 13:31 ` Alan Cox
@ 2006-01-06 15:22 ` Pavel Machek
2006-01-06 19:06 ` Jan Engelhardt
2006-01-06 22:48 ` Dave Jones
1 sibling, 2 replies; 40+ messages in thread
From: Pavel Machek @ 2006-01-06 15:22 UTC (permalink / raw)
To: Dave Jones, Alan Cox, linux-kernel
Hi!
> > > The one case this doesn't catch is the problem of oopses whilst
> > > in X. Previously a non-fatal oops would stall X momentarily,
> > > and then things continue. Now those cases will lock up completely
> > > for two minutes.
> >
> > The console has awareness of graphic/text mode at all times and knows
> > what is going on. Why not use that information if you must go this way ?
>
> If we've just oopsed, the console may have no awareness of what day it is,
> yet alone anything about video modes. I'm not entirely sure what you're
> suggesting, but it gives me the creeps. Are you talking about switching
> away from X back to a tty when we oops?
No.
But you _know_ if user is running X or not -- notice that kernel does
not attempt to printk() when X is running, because that could lock up
the box.
If user is running X, you don't need the delay.
if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
delay(10sec)
}
or something like that should do the trick.
Pavel
--
Thanks, Sharp!
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-06 15:22 ` Pavel Machek
@ 2006-01-06 19:06 ` Jan Engelhardt
2006-01-06 22:34 ` Pavel Machek
2006-01-06 22:48 ` Dave Jones
1 sibling, 1 reply; 40+ messages in thread
From: Jan Engelhardt @ 2006-01-06 19:06 UTC (permalink / raw)
To: Pavel Machek; +Cc: Dave Jones, Alan Cox, linux-kernel
>No.
>
>But you _know_ if user is running X or not -- notice that kernel does
>not attempt to printk() when X is running, because that could lock up
>the box.
>
>If user is running X, you don't need the delay.
>
>if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
Does framebuffer fall under KD_TEXT?
Jan Engelhardt
--
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-06 19:06 ` Jan Engelhardt
@ 2006-01-06 22:34 ` Pavel Machek
0 siblings, 0 replies; 40+ messages in thread
From: Pavel Machek @ 2006-01-06 22:34 UTC (permalink / raw)
To: Jan Engelhardt; +Cc: Dave Jones, Alan Cox, linux-kernel
On Pá 06-01-06 20:06:36, Jan Engelhardt wrote:
> >No.
> >
> >But you _know_ if user is running X or not -- notice that kernel does
> >not attempt to printk() when X is running, because that could lock up
> >the box.
> >
> >If user is running X, you don't need the delay.
> >
> >if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
>
> Does framebuffer fall under KD_TEXT?
I think so.
--
Thanks, Sharp!
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-06 15:22 ` Pavel Machek
2006-01-06 19:06 ` Jan Engelhardt
@ 2006-01-06 22:48 ` Dave Jones
1 sibling, 0 replies; 40+ messages in thread
From: Dave Jones @ 2006-01-06 22:48 UTC (permalink / raw)
To: Pavel Machek; +Cc: Alan Cox, linux-kernel
On Fri, Jan 06, 2006 at 04:22:03PM +0100, Pavel Machek wrote:
> Hi!
>
> > > > The one case this doesn't catch is the problem of oopses whilst
> > > > in X. Previously a non-fatal oops would stall X momentarily,
> > > > and then things continue. Now those cases will lock up completely
> > > > for two minutes.
> > >
> > > The console has awareness of graphic/text mode at all times and knows
> > > what is going on. Why not use that information if you must go this way ?
> >
> > If we've just oopsed, the console may have no awareness of what day it is,
> > yet alone anything about video modes. I'm not entirely sure what you're
> > suggesting, but it gives me the creeps. Are you talking about switching
> > away from X back to a tty when we oops?
>
> No.
>
> But you _know_ if user is running X or not -- notice that kernel does
> not attempt to printk() when X is running, because that could lock up
> the box.
>
> If user is running X, you don't need the delay.
>
> if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
> delay(10sec)
> }
>From this context though, we don't have a 'vc' to reference,
so we'll need to find out from the console layer somehow, which
is the current vc.
Dave
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 4:52 Dave Jones
2006-01-05 8:15 ` Jan Engelhardt
2006-01-05 13:37 ` Alan Cox
@ 2006-01-05 13:58 ` Avishay Traeger
2006-01-05 20:54 ` Dave Jones
2006-01-06 0:19 ` Josef Sipek
2006-01-05 14:39 ` Kyle McMartin
3 siblings, 2 replies; 40+ messages in thread
From: Avishay Traeger @ 2006-01-05 13:58 UTC (permalink / raw)
To: Dave Jones; +Cc: linux-kernel
Some comments:
1. I think this is a good idea, since serial consoles can also change
timings. I have seen several race conditions where the problem goes
away once I add a serial console.
2. Should this be a separate debugging option?
3. Shouldn't you have KERN____ in your printk statements?
4. Wouldn't printing out the message every second make the oops scroll
off the screen, defeating the purpose of the patch?
Avishay Traeger
http://www.fsl.cs.sunysb.edu/~avishay/
On Wed, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
> In my quest to get better debug data from users in Fedora bug reports,
> I came up with this patch. A majority of users don't have serial
> consoles, so when an oops scrolls off the top of the screen,
> and locks up, they usually end up reporting a 2nd (or later) oops
> that isn't particularly helpful (or worse, some inconsequential
> info like 'sleeping whilst atomic' warnings)
>
> With this patch, if we oops, there's a pause for a two minutes..
> which hopefully gives people enough time to grab a digital camera
> to take a screenshot of the oops.
>
> It has an on-screen timer so the user knows what's going on,
> (and that it's going to come back to life [maybe] after the oops).
>
> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes. Future patches could add some additional feedback
> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.
>
> Signed-off-by: Dave Jones <davej@redhat.com>
>
> --- vanilla/arch/i386/kernel/traps.c 2006-01-02 22:21:10.000000000 -0500
> +++ linux-2.6.15/arch/i386/kernel/traps.c 2006-01-04 23:42:46.000000000 -0500
> @@ -256,6 +271,15 @@ void show_registers(struct pt_regs *regs
> }
> }
> printk("\n");
> + {
> + int i;
> + for (i=120;i>0;i--) {
> + mdelay(1000);
> + touch_nmi_watchdog();
> + printk("Continuing in %d seconds. \r", i);
> + }
> + printk("\n");
> + }
> }
>
> static void handle_BUG(struct pt_regs *regs)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 13:58 ` Avishay Traeger
@ 2006-01-05 20:54 ` Dave Jones
2006-01-06 0:19 ` Josef Sipek
1 sibling, 0 replies; 40+ messages in thread
From: Dave Jones @ 2006-01-05 20:54 UTC (permalink / raw)
To: Avishay Traeger; +Cc: linux-kernel
On Thu, Jan 05, 2006 at 08:58:53AM -0500, Avishay Traeger wrote:
> Some comments:
> 1. I think this is a good idea, since serial consoles can also change
> timings. I have seen several race conditions where the problem goes
> away once I add a serial console.
> 2. Should this be a separate debugging option?
maybe
> 3. Shouldn't you have KERN____ in your printk statements?
doesn't make a great deal of difference in this context.
> 4. Wouldn't printing out the message every second make the oops scroll
> off the screen, defeating the purpose of the patch?
no. that's why it uses \r instead of \n.
Dave
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 13:58 ` Avishay Traeger
2006-01-05 20:54 ` Dave Jones
@ 2006-01-06 0:19 ` Josef Sipek
2006-01-06 1:12 ` Bernd Eckenfels
1 sibling, 1 reply; 40+ messages in thread
From: Josef Sipek @ 2006-01-06 0:19 UTC (permalink / raw)
To: Avishay Traeger; +Cc: Dave Jones, linux-kernel
On Thu, Jan 05, 2006 at 08:58:53AM -0500, Avishay Traeger wrote:
> Some comments:
> 1. I think this is a good idea, since serial consoles can also change
> timings. I have seen several race conditions where the problem goes
> away once I add a serial console.
Agreed.
> 2. Should this be a separate debugging option?
Agreed.
> 3. Shouldn't you have KERN____ in your printk statements?
That's something to watch out for...If you say have:
printk(KERN_DEBUG "fooo.....");
do_foo();
printk(KERN_DEBUG "done.\n");
Then, you'll get the extra "<7>" on the screen and in the logs (assuming
you set the printk levels to display KERN_DEBUG).
Now, I'm not 100% sure about '\r', but I suspect it does the same thing.
> 4. Wouldn't printing out the message every second make the oops scroll
> off the screen, defeating the purpose of the patch?
No, read the patch carefully, it uses '\r' to go back to the begining of
the line and overwrites the message.
Jeff.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-06 0:19 ` Josef Sipek
@ 2006-01-06 1:12 ` Bernd Eckenfels
2006-01-06 1:35 ` Josef Sipek
0 siblings, 1 reply; 40+ messages in thread
From: Bernd Eckenfels @ 2006-01-06 1:12 UTC (permalink / raw)
To: linux-kernel
Josef Sipek <jsipek@fsl.cs.sunysb.edu> wrote:
> That's something to watch out for...If you say have:
>
> printk(KERN_DEBUG "fooo.....");
> do_foo();
> printk(KERN_DEBUG "done.\n");
dont do it. It is better to have the time stamps for both and to have atomic
prints. In fact I would disallow this and add automatic linebreaks.
Gruss
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-06 1:12 ` Bernd Eckenfels
@ 2006-01-06 1:35 ` Josef Sipek
2006-01-06 2:21 ` Bernd Eckenfels
0 siblings, 1 reply; 40+ messages in thread
From: Josef Sipek @ 2006-01-06 1:35 UTC (permalink / raw)
To: Bernd Eckenfels; +Cc: linux-kernel
On Fri, Jan 06, 2006 at 02:12:59AM +0100, Bernd Eckenfels wrote:
> Josef Sipek <jsipek@fsl.cs.sunysb.edu> wrote:
> > That's something to watch out for...If you say have:
> >
> > printk(KERN_DEBUG "fooo.....");
> > do_foo();
> > printk(KERN_DEBUG "done.\n");
>
> dont do it. It is better to have the time stamps for both and to have atomic
> prints.
First of all, the above code is to just illustrate a point. And as a matter of
fact it may not even work if some other kernel thread prints something while
do_foo() is executing, the whole thing will get screwed up.
If I remember correctly, I the second line of the "sample" code, will _NOT_
produce a timestamp. So, the output will be:
[1234567.123456] fooo.....<7>done.
where, the timestamp is that of the first printk.
> In fact I would disallow this and add automatic linebreaks.
I wouldn't go that far. I'd just let the kernel janitors people have fun with
the existing code :)
Jeff.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-06 1:35 ` Josef Sipek
@ 2006-01-06 2:21 ` Bernd Eckenfels
0 siblings, 0 replies; 40+ messages in thread
From: Bernd Eckenfels @ 2006-01-06 2:21 UTC (permalink / raw)
To: linux-kernel
Josef Sipek <jsipek@fsl.cs.sunysb.edu> wrote:
> First of all, the above code is to just illustrate a point. And as a matter of
> fact it may not even work if some other kernel thread prints something while
> do_foo() is executing, the whole thing will get screwed up.
Thats another reason to not do it. And this means for me, we do not need to
support or optimize for this kind of printk abuse.
> If I remember correctly, I the second line of the "sample" code, will _NOT_
> produce a timestamp. So, the output will be:
>
> [1234567.123456] fooo.....<7>done.
> where, the timestamp is that of the first printk.
Yes, thats the other problem, you miss the timestamp for the end of a long
running operation. Thats why it is better to have that in two lines (maybe
the second line with smaller severity)
Gruss
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: oops pauser.
2006-01-05 4:52 Dave Jones
` (2 preceding siblings ...)
2006-01-05 13:58 ` Avishay Traeger
@ 2006-01-05 14:39 ` Kyle McMartin
3 siblings, 0 replies; 40+ messages in thread
From: Kyle McMartin @ 2006-01-05 14:39 UTC (permalink / raw)
To: Dave Jones, linux-kernel
On Wed, Jan 04, 2006 at 11:52:12PM -0500, Dave Jones wrote:
> printk("\n");
> + {
> + int i;
> + for (i=120;i>0;i--) {
> + mdelay(1000);
> + touch_nmi_watchdog();
> + printk("Continuing in %d seconds. \r", i);
> + }
> + printk("\n");
> + }
>
Nice, this is cool. Though, perhaps it would be better if the loop length
was a command line argument like with panic_timeout?
Cheers,
Kyle
^ permalink raw reply [flat|nested] 40+ messages in thread