linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.14-pre8 Alt-SysRq-[TM] failure during lockup...
@ 2001-11-05  5:40 Eric W. Biederman
  2001-11-05 18:37 ` Randy.Dunlap
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2001-11-05  5:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


Summary:  I triggered a condition in 2.4.14-pre8 where SysRq triggered
but would not print reports.  I managed to unstick the condition but
had played to much to determine the root cause.  My guess is that
somehow my default loglevel was messed up.  Full information is
provided just case I did not muddy the waters too much.

I accidentily typed mke2fs /dev/hdc instead of mke2fs /dev/hdc6 on 
my machine flinx a few days ago.  So to recover the machine I made up a
boot flopy intially with pre7 (which rebooted while I wasn't in the
room) and then with pre8.  I booted the machine with a read-only nfs
root.  I have been recovering the machine by untaring a multiple
volume tarball stored, on CDs, which I mounted via NFS from my other
machine.  To unmount and remount the CDs I was starting and stopping
the nfs server.

flinx is a 486-DX80 with 32MB of RAM. 1 512MB hard drive /dev/hda 
,1 20GB hard driver /dev/hdc, and 1 ancient IDE-CDRW /dev/hdd.  Also
is attached a NE2000 clone, which I believe is a 10Mbit D-Link card.
The machine has no PCI, just EISA and PCI.  It has been solidly
running linux for the past 5 years.  For the last year and a half it
has been running 2.2.12, until I typo'd.  It has a small swap
partition on /dev/hda and a 1 GB swap partition on /dev/hdc, both of
which I enabled.  

At the time of the lockup the system had been running 2.4.14-pre8 for
about 12 hours, in single user mode.  I had untar approxiamently 9GB
of data over the NFS mount, and was just finishing up.  I stopped the
NFS server and removed the last CD.  The I typed ls on flinx,
attempting to list /dev/hdc2 where I had placed all of the data.  With
the NFS server stop it hung of course.  Then I started the NFS server,
and the machine stayed locked.  I stoped and started it again with no
luck.  

I then tried playing with sysrq and this is where I got worried.
Alt-SysRq-Space gave the menu as normal.
Alt-SysRq-M (showMem) printed just: "SysRq: Show Memory"
Alt-SysRq-T (showTasks) printed just: "Sysrq: Show State"

Which is extremely strange the reports which should be at a higher
loglevel were not displayed.

Then I typoed and pressed. Alt-SysRq-E (tErm) and I started getting the
reports back.  

Also interesting is that Alt-SysRq-S prints:
SysRq : Emergency Sync
Syncing device 16:02 ... OK
Done.

Even after several times of pressing it like the sync really doesn't
happen.  I suppose this could be a strange affect of having the
filesystem mounted read-write though.

By now I have all Alt-SysRq-K (saK) and Alt-SysRq-I (kIll).  So I have
killed off all of my shells.  I had though I had pressed Alt-SysRq-L
(killalL) but that was my confusion.  I don't use SysRq much.  The
result being I doubt my system in a state that is useful for
debugging, but for completeness.

The current output of Alt-SysRq-T (showTasks) lists:
init         D   1
keventd      S   2
ksotirqd_CPU S   3
kswapd       S   4
bdflush      S   5
kupdate      S   6
rpciod       S   7
init         Z   8
rc.sysinit   Z   9
bash         Z  76
bash         Z 270

Mem-info:
Free pages:              26596kB (      0kB HighMem)
Zone: DMA freepages: 13192kB min: 512kB low: 1024kB high: 1536kB
Zone: Normal freepages: 13404kB min: 128kB low: 256kB high: 384kB
Zone: HighMem freepages:     0kb min:  0kb low: 0kb high: 0kb
( Active: 170, inactive: 150, free 6649 )
276*4Kb 267*8kB 196*16Kb 101*32kb 34*64Kb 7*128Kb 0*256kb 1*512kb 0*1024kB 0*1024kb 0*2048kb = 13192kb) 
293*4Kb 249*8kB 204*16Kb 116*32kb 37*64Kb 7*128Kb 0*256kb 0*512kb 0*1024kB 0*1024kb 0*2048kb = 13404kb) 
= 0kb)
Swap cache: add 11552, delete 11429, find 4400/5117, race 0+0
Free Swap:   1050244kb
8192 pages of RAM
0 pages of HIGHMEM
560 reserved pages
158 pages shared
123 pages swap cached
48 pages in page table cache
Buffer memory:   72kB


With tcpdump all I see on the network between my two machines is:
22:12:24.172976 flinx.1618384030 > frodo.nfs: 112 read [|nfs] (DF)
22:12:24.173019 frodo.nfs > flinx.1618384030: reply ok 28 read [|nfs]
22:12:24.174598 flinx.1635161246 > frodo.nfs: 112 read [|nfs] (DF)
22:12:24.174636 frodo.nfs > flinx.1635161246: reply ok 28 read [|nfs]
22:12:24.176419 flinx.1651938462 > frodo.nfs: 112 read [|nfs] (DF)
22:12:24.176445 frodo.nfs > flinx.1651938462: reply ok 28 read [|nfs]
22:12:24.178027 flinx.1668715678 > frodo.nfs: 112 read [|nfs] (DF)
22:12:24.182406 frodo.nfs > flinx.1668715678: reply ok 28 read [|nfs]

and ping works.

With Alt-Sys-P (ShowPc) the pids cycle through:
Pid 1: init
Pid 7: rpciod
Pid 0: ``swapper'' but really the idle task.


And finally Alt-SysRq-L (killalL) instead of panicing the kernel
Locked it up so that Alt-SysRq functions and the machine is now
completely unresponsive.

Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.14-pre8 Alt-SysRq-[TM] failure during lockup...
  2001-11-05  5:40 2.4.14-pre8 Alt-SysRq-[TM] failure during lockup Eric W. Biederman
@ 2001-11-05 18:37 ` Randy.Dunlap
  2001-11-06  2:41   ` Eric W. Biederman
  0 siblings, 1 reply; 6+ messages in thread
From: Randy.Dunlap @ 2001-11-05 18:37 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-kernel

"Eric W. Biederman" wrote:
> 
> Summary:  I triggered a condition in 2.4.14-pre8 where SysRq triggered
> but would not print reports.  I managed to unstick the condition but
> had played to much to determine the root cause.  My guess is that
> somehow my default loglevel was messed up.  Full information is
> provided just case I did not muddy the waters too much.

Do you know what the console loglevel was when you tried
to use Alt-SysRq-M (show_mem) or Alt-SysRq-T (show tasks ==
show_state)?  (first value listed in /proc/sys/kernel/printk file)

show_mem() and show_state() don't modify the current value of
console_loglevel; they depend on the sysrq handler to do that.
That value could be too low/small.

E.g., if console_loglevel is 4, show_mem and show_state don't
show me anything on the console either, but they are added
to the log file.

[rubout]

> I then tried playing with sysrq and this is where I got worried.
> Alt-SysRq-Space gave the menu as normal.
> Alt-SysRq-M (showMem) printed just: "SysRq: Show Memory"
> Alt-SysRq-T (showTasks) printed just: "Sysrq: Show State"
> 
> Which is extremely strange the reports which should be at a higher
> loglevel were not displayed.
> 
> Then I typoed and pressed. Alt-SysRq-E (tErm) and I started getting the
> reports back.

Aye, sysrq_handle_term sets console_loglevel to 8 and leaves it there.

[rubout]

~Randy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.14-pre8 Alt-SysRq-[TM] failure during lockup...
  2001-11-05 18:37 ` Randy.Dunlap
@ 2001-11-06  2:41   ` Eric W. Biederman
  2001-11-06 19:10     ` Randy.Dunlap
  0 siblings, 1 reply; 6+ messages in thread
From: Eric W. Biederman @ 2001-11-06  2:41 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: linux-kernel

"Randy.Dunlap" <rddunlap@osdl.org> writes:

> "Eric W. Biederman" wrote:
> > 
> > Summary:  I triggered a condition in 2.4.14-pre8 where SysRq triggered
> > but would not print reports.  I managed to unstick the condition but
> > had played to much to determine the root cause.  My guess is that
> > somehow my default loglevel was messed up.  Full information is
> > provided just case I did not muddy the waters too much.
> 
> Do you know what the console loglevel was when you tried
> to use Alt-SysRq-M (show_mem) or Alt-SysRq-T (show tasks ==
> show_state)?  (first value listed in /proc/sys/kernel/printk file)

I was in single user mode so it shouldn't have been changed
from it's default value.  But it might have been.
 
> show_mem() and show_state() don't modify the current value of
> console_loglevel; they depend on the sysrq handler to do that.
> That value could be too low/small.

Right.  That looks to be why I couldn't see my debug information.

There was definentily something hosed at the kernel level
that shouldn't have been.  We should be very careful before
we move 2.4.14-pre8 to 2.4.14.  Hopefully I/someone can find a way
to reproduce a lockup and track it down.


> Aye, sysrq_handle_term sets console_loglevel to 8 and leaves it there.

That is good to know thanks.

Eric



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.14-pre8 Alt-SysRq-[TM] failure during lockup...
  2001-11-06  2:41   ` Eric W. Biederman
@ 2001-11-06 19:10     ` Randy.Dunlap
  2001-11-07  3:49       ` Keith Owens
  0 siblings, 1 reply; 6+ messages in thread
From: Randy.Dunlap @ 2001-11-06 19:10 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel

"Eric W. Biederman" wrote:
> 
> "Randy.Dunlap" <rddunlap@osdl.org> writes:
> 
> > "Eric W. Biederman" wrote:
> > >
> > > Summary:  I triggered a condition in 2.4.14-pre8 where SysRq triggered
> > > but would not print reports.  I managed to unstick the condition but
> > > had played to much to determine the root cause.  My guess is that
> > > somehow my default loglevel was messed up.  Full information is
> > > provided just case I did not muddy the waters too much.
> >
> > Do you know what the console loglevel was when you tried
> > to use Alt-SysRq-M (show_mem) or Alt-SysRq-T (show tasks ==
> > show_state)?  (first value listed in /proc/sys/kernel/printk file)
> 
> I was in single user mode so it shouldn't have been changed
> from it's default value.  But it might have been.

Eric,

I have seen some distro(s) in which either syslogd or klogd (I'm
guessing) uses the syslog syscall to change the console loglevel
value in the kernel.
This could still happen in single user mode, couldn't it?

It bugged me because I often use the "debug" boot parameter
to set console_loglevel to 10, but all of a sudden it had been
set back to 6 IIRC!  And right now on one of my test
systems it is set to 0 according to /proc/sys/kernel/printk,
although _I_ didn't ask for it to be changed to 0, and
I haven't been able to find what's changing it to 0, since
it was 10 during init/main.c.

~Randy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.14-pre8 Alt-SysRq-[TM] failure during lockup...
  2001-11-06 19:10     ` Randy.Dunlap
@ 2001-11-07  3:49       ` Keith Owens
  2001-11-12 17:55         ` rddunlap
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Owens @ 2001-11-07  3:49 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Eric W. Biederman, linux-kernel

On Tue, 06 Nov 2001 11:10:54 -0800, 
"Randy.Dunlap" <rddunlap@osdl.org> wrote:
>It bugged me because I often use the "debug" boot parameter
>to set console_loglevel to 10, but all of a sudden it had been
>set back to 6 IIRC!  And right now on one of my test
>systems it is set to 0 according to /proc/sys/kernel/printk,
>although _I_ didn't ask for it to be changed to 0, and
>I haven't been able to find what's changing it to 0, since
>it was 10 during init/main.c.

Any chance that console_loglevel, default_message_loglevel,
minimum_console_loglevel and default_console_loglevel are not together
in memory?  I see that the patch from Jesper Juhl <juhl@eisenstein.dk>
to fix this bug has not gone into the kernel yet.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.14-pre8 Alt-SysRq-[TM] failure during lockup...
  2001-11-07  3:49       ` Keith Owens
@ 2001-11-12 17:55         ` rddunlap
  0 siblings, 0 replies; 6+ messages in thread
From: rddunlap @ 2001-11-12 17:55 UTC (permalink / raw)
  To: Keith Owens; +Cc: Eric W. Biederman, linux-kernel

On Wed, 7 Nov 2001, Keith Owens wrote:

| On Tue, 06 Nov 2001 11:10:54 -0800,
| "Randy.Dunlap" <rddunlap@osdl.org> wrote:
| >It bugged me because I often use the "debug" boot parameter
| >to set console_loglevel to 10, but all of a sudden it had been
| >set back to 6 IIRC!  And right now on one of my test
| >systems it is set to 0 according to /proc/sys/kernel/printk,
| >although _I_ didn't ask for it to be changed to 0, and
| >I haven't been able to find what's changing it to 0, since
| >it was 10 during init/main.c.
|
| Any chance that console_loglevel, default_message_loglevel,
| minimum_console_loglevel and default_console_loglevel are not together
| in memory?  I see that the patch from Jesper Juhl <juhl@eisenstein.dk>
| to fix this bug has not gone into the kernel yet.
|

That would have been an acceptable analysis and solution to me,
but alas, I checked, and that's not the problem.

Thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-11-12 17:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-05  5:40 2.4.14-pre8 Alt-SysRq-[TM] failure during lockup Eric W. Biederman
2001-11-05 18:37 ` Randy.Dunlap
2001-11-06  2:41   ` Eric W. Biederman
2001-11-06 19:10     ` Randy.Dunlap
2001-11-07  3:49       ` Keith Owens
2001-11-12 17:55         ` rddunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).